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Title  of  dissertation:  COMPARING  STRENGTH  OF  LOCALITY 
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FOR  THE  MISS  RATES  AND  OUTPUTS  OF  CACHES 


Sarut  Vanichpun,  Doctor  of  Philosophy,  2005 


Dissertation  directed  by:  Professor  Armand  M.  Makowski 

Department  of  Electrical  and  Computer  Engineering  and 
Institute  for  Systems  Research 


The  performance  of  demand-driven  caching  is  known  to  depend  on  the  locality  of 
reference  exhibited  by  the  stream  of  requests  made  to  the  cache.  In  spite  of  numerous 
efforts,  no  consensus  has  been  reached  on  how  to  formalize  this  notion,  let  alone  on 
how  to  compare  streams  of  requests  on  the  basis  of  their  locality  of  reference.  We  take 
on  this  issue  with  an  eye  towards  validating  operational  expectations  associated  with  the 
notion  of  locality  of  reference.  We  focus  on  two  “folk  theorems,”  that  is,  (i)  The  stronger 
the  locality  of  reference,  the  smaller  the  miss  rate  of  the  cache;  and  (ii)  Good  caching 
is  expected  to  produce  an  output  stream  of  requests  exhibiting  less  locality  of  reference 
than  the  input  stream  of  requests.  These  two  folk  theorems  are  explored  in  the  context 
of  demand-driven  caching  for  the  two  main  contributors  of  locality  of  reference,  namely 
popularity  and  temporal  correlations. 

We  first  focus  exclusively  on  popularity  by  considering  the  situation  where  there 
are  no  temporal  correlations  in  the  stream  of  requests,  as  would  be  the  case  under  the 


Independent  Reference  Model  (IRM).  As  we  propose  to  measure  strength  of  locality 
of  reference  in  a  stream  of  requests  through  the  skewness  of  its  popularity  distribution, 
we  introduce  the  notion  of  majorization  as  a  means  for  capturing  this  degree  of  skew¬ 
ness.  We  show  that  these  folk  theorems  hold  for  caches  operating  under  a  large  class 
of  replacement  policies,  the  so-called  Random  On-demand  Replacement  Algorithms 
(RORA),  which  includes  the  optimal  policy  A0  and  the  random  policy.  However,  coun¬ 
terexamples  prove  that  this  is  not  always  the  case  under  the  (popular)  Least-Recently- 
Used  (LRU)  and  CLIMB  policies.  In  such  cases,  conjectures  are  offered  (and  supported 
by  simulations)  as  to  when  the  folk  theorems  would  hold  under  the  LRU  or  CLIMB 
caching,  given  that  the  IRM  input  has  a  Zipf-like  popularity  pmf. 

To  compare  the  strength  of  temporal  correlations  in  streams  of  requests,  we  define 
the  notion  of  Temporal  Correlations  (TC)  ordering  based  on  the  so-called  supermodular 
ordering,  a  concept  of  positive  dependence  which  has  been  successfully  used  for  com¬ 
paring  dependence  structures  in  sequences  of  random  variables.  We  explore  how  the  TC 
ordering  captures  the  strength  of  temporal  correlations  in  several  Web  request  models, 
namely  the  higher-order  Markov  chain  model  (HOMM),  the  partial  Markov  chain  model 
(PMM)  and  the  Least-Recently-Used  stack  model  (LRUSM).  We  establish  the  folk  the¬ 
orem  to  the  effect  that  the  stronger  the  strength  of  temporal  correlations,  the  smaller  the 
miss  rate  for  the  PMM  under  certain  assumptions  on  the  caching  policy.  Conjectures 
and  simulations  are  offered  as  to  when  this  folk  theorem  would  hold  under  the  HOMM 
and  under  the  LRUSM.  In  addition,  the  validity  of  this  folk  theorem  for  general  request 
streams  under  the  Working  Set  algorithm  is  studied. 

Lastly,  we  investigate  how  the  majorization  and  TC  orderings  can  be  translated  into 
comparisons  of  three  well-known  locality  of  reference  metrics,  namely  the  working  set 
size,  the  inter-reference  time  and  the  stack  distance. 
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Chapter  1 


Introduction 

1.1  Web  caching 

Web  caching  aims  to  reduce  network  traffic,  server  load  and  user-perceived  retrieval 
latency  by  replicating  “popular”  content  on  (proxy)  caches  that  are  strategically  placed 
within  the  network.  This  approach  is  a  natural  outgrowth  of  caching  techniques  which 
were  originally  developed  for  computer  memory  and  distributed  file  sharing  systems, 
e.g.,  [2,  24]  (and  references  therein). 

Since  its  inception,  the  World  Wide  Web  has  seen  an  exponential  increase  in  the 
number  of  its  users  and  in  the  volume  of  objects  to  be  accessed.  This  trend,  which 
is  not  likely  to  abate  anytime  soon,  is  challenging  current  cache  architectures  to  meet 
the  complementary  mandates  of  speed ,  scalability  and  reliability  which  are  central  to 
delivering  a  satisfactory  user  experience. 

Generally  speaking,  scalability  requires  some  form  of  hierarchical  organization.  In 
the  context  of  Web  caching,  this  notion  has  led  naturally  to  the  deployment  of  multi¬ 
layered  systems  of  interconnected  caches  which  may  be  organized  in  a  tree-like  hierar¬ 
chy  or  in  more  complicated  meshes  [12,  16,  29]  (and  references  therein). 

Even  a  cursory  review  of  the  literature  [5,  54,  69]  already  reveals  the  large  number 
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of  difficult  and  challenging  issues  that  need  to  be  addressed  in  order  to  ensure  proper 
operations  of  these  distributed  multi-level  caching  systems.  Examples  of  these  issues 
include  (i)  cache  replacement  strategies  [15,  39,  54,  55];  (ii)  prefetching  algorithms  [25] 
(and  references  therein);  (iii)  cache  location  [43, 44];  (iv)  content  placement  [23,  57,  68]; 
and  (v)  cache  cooperation  techniques  [16,  17,  30]. 

1.2  Locality  of  reference 

Although  these  challenges  have  renewed  interest  in  caching  in  general,  some  basic  is¬ 
sues  are  still  not  well  understood.  Indeed,  the  performance  of  any  form  of  caching  is 
determined  by  a  number  of  factors,  chief  amongst  them  the  statistical  properties  of  the 
streams  of  requests  made  to  the  cache.  One  important  such  property  is  the  locality  of 
reference  present  in  a  stream  of  requests  whereby  “bursts  of  references  are  made  in  the 
near  future  to  objects  referenced  in  the  recent  past.” 

The  notion  of  locality  and  its  importance  for  caching  were  first  recognized  by  Belady 
[10]  in  the  context  of  computer  memory,  and  attempts  at  characterization  were  made 
early  on  by  Denning  through  the  working  set  model  [26,  27].  Subsequently,  a  number 
of  studies  have  shown  that  request  streams  for  Web  objects  exhibit  strong  locality  of 
reference1  [40,  41,  46]  and  various  metrics  have  been  proposed  for  characterizing  the 
locality  of  reference  in  Web  request  streams  [1,  34,  40]. 

Although  several  competing  definitions  for  locality  of  reference  are  available,  it  is  by 
now  widely  accepted  that  the  two  main  contributors  to  locality  of  reference  are  temporal 
correlations  in  the  streams  of  requests  and  the  popularity  distribution  of  requested  ob¬ 
jects.  To  describe  these  two  sources  of  locality,  and  to  frame  the  subsequent  discussion, 

'At  least  in  the  short  timescales. 
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we  assume  the  following  generic  setup:  We  consider  a  universe  of  N  cacheable  items 
or  documents,  labeled  i  =  1, . . . ,  N,  and  we  write  J\f  =  {1, . . . ,  N}.  The  successive 
requests  arriving  at  the  cache  are  modeled  by  a  sequence  R  —  {Rt,  t  —  0,1,...}  of 
A/”-valued  rvs. 

1.  The  popularity  of  the  sequence  of  requests  {Rt,  t  =  0, 1, . . .}  is  defined  as  the 
pmfp  =  (p(i), . . .  ,p(N ))  on  J\f  given  by 

1  t 

p{i)  :=  lim  ^  1  [RT  =  i]  a.s.,  i  =  1, . . . ,  N 

^  T—  1 

whenever  these  limits  exist  (and  they  do  in  most  models  treated  in  the  literature).  Popu¬ 
larity  is  usually  viewed  as  a  long-term  expression  of  locality  which  captures  the  likeli¬ 
hood  that  a  document  will  be  requested  in  the  future  relative  to  other  documents. 

2.  Temporal  correlations  are  more  delicate  to  define  due  to  the  “categorical”  nature 
of  the  requests  {Rt,  t  =  0,1,...}.  Indeed,  it  is  somewhat  meaningless  to  use  the 
covariance  function 

7(s,t)  :=  Cov[Rs,Rt},  s,t  =  0,1,.... 

as  a  way  to  capture  these  temporal  correlations  as  is  traditionally  done  in  other  contexts. 
This  is  because  of  the  categorical  nature  of  the  rvs  {Rt,  t  —  0, 1, . . .}  which  take  values 
in  a  discrete  set  -  We  took  (1, . . . ,  N}  but  could  have  selected  (1,  . . . ,  instead; 

in  fact  any  set  of  N  distinct  points  in  an  arbitrary  space  would  do  the  job.  Thus,  the 
actual  values  of  the  rvs  {Rt,  t  =  0, 1, . . .}  are  of  no  consequence,  and  the  focus  should 
instead  be  on  the  recurrence  patterns  exhibited  by  requests  for  particular  documents 
over  time.  The  literature  contains  several  metrics  for  doing  this,  e.g.,  the  inter-reference 
time  [34,  40,  53],  the  working  set  size  [26,  27]  and  the  stack  distance  [1,  3,  50]. 
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1.3  Folk  theorems 


Like  the  notion  of  burstiness  used  in  traffic  modeling,  locality  of  reference,  while  en¬ 
dowed  with  a  clear  intuitive  content,  admits  no  simple  definition.  Not  surprisingly,  in 
spite  of  numerous  efforts,  no  consensus  has  been  reached  on  how  to  formalize  the  no¬ 
tion,  let  alone  on  how  to  compare  streams  of  requests  on  the  basis  of  their  locality  of 
reference.2  In  addition,  lacking  in  most  of  the  work  done  thus  far,  is  a  clear  recognition 
of  the  system- wide  nature  of  Web  caching,  whereby  local  transformative  actions  shape 
the  streams  of  requests  as  they  pass  through  successive  caches.3  These  problems  have 
precluded  a  formal  study  of  the  following  “folk  theorems”: 

1.  Folk  theorem  on  miss  rates  -  The  stronger  the  locality  of  reference  in  the  stream 
of  requests,  the  smaller  the  miss  rate,  since  the  cache  ends  up  being  populated 
by  objects  with  a  higher  likelihood  of  access  in  the  near  future.  Such  a  property, 
if  true,  would  confirm  the  central  role  played  by  locality  of  reference  in  shaping 
cache  performance.  In  fact,  the  very  presence  of  locality  of  reference  in  the  stream 
of  requests  is  what  makes  caching  at  all  possible;  and 

2.  Folk  theorem  on  output  streams  -  Good  cache  replacement  strategies  “absorb” 
locality  of  reference  to  a  certain  extent  by  producing  a  stream  of  misses  from 
the  cache  -  its  so-called  output  -  which  exhibits  less  locality  of  reference  than 
the  input  stream  of  requests.  In  the  context  of  multi-level  caching,  this  reduction 
property  is  often  perceived  as  one  of  the  main  reasons  for  why  caching  looses  its 
effectiveness  after  some  level  in  a  hierarchy  of  caches. 

Exceptions  can  be  found  in  [34,  65]. 

3Recent  works  on  this  issue  can  be  found  in  [17,  30,  32]  for  cache  management  and  in  [47,  70,  71]  for 
Web  traffic  analysis. 
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Such  folk  theorems  are  expected  to  hold  for  demand-driven  caching  that  exploits 
recency  of  reference.  Interest  in  establishing  them  under  specific  definitions  of  locality 
of  reference  stems  from  a  desire  to  validate  their  operational  significance  on  caching 
systems.  Counterexamples  would  cast  some  doubts  as  to  whether  a  particular  definition 
indeed  captures  the  intuitive  meaning  of  locality  of  reference  and  to  whether  a  particular 
caching  algorithm  is  indeed  a  well-behaved  policy. 

1.4  Contributions 

In  this  dissertation,  we  identify  notions  of  locality  of  reference  which  are  capable  of 
comparing  the  strength  of  locality  of  reference  between  streams  of  requests.  Such  no¬ 
tions  allow  a  comparison  statement  of  the  form 

R1<lrR 2  (1.1) 

to  the  effect  that  “a  request  stream  R  has  less  locality  of  reference  than  a  request  stream 
i?J”  under  some  appropriate  notion  of  locality  of  reference.  With  the  comparison  (1.1), 
we  are  able  to  formally  investigate  the  folk  theorems  mentioned  above,  albeit  in  a  simple 
framework  under  demand-driven  cache  replacement  policies.  Indeed,  the  folk  theorem 
for  miss  rates  can  be  formalized  as 

M7r(R2)  <  Mn(R1)  whenever  (1.1)  holds  (1.2) 

where  MfiR  )  and  Mn{R  )  denote  the  miss  rates  of  the  request  streams  R  and  R 
under  the  cache  replacement  policy  n,  respectively,  while  the  folk  theorem  for  output 
streams  simply  states  that 

K  <LR  R  (1.3) 

where  R *  is  the  output  stream  of  the  cache  operating  under  the  policy  n  when  the  input 

stream  is  R. 
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The  tasks  above  have  been  carried  out  separately  for  the  two  main  sources  of  local¬ 
ity  of  reference,  namely  popularity  and  temporal  correlations.  We  now  summarize  the 
corresponding  results  in  some  details. 

1.4.1  Majorization  and  popularity 

We  first  focus  exclusively  on  popularity  as  a  way  to  formalize  (1.1).  To  isolate  its  contri¬ 
butions,  we  consider  the  situation  where  there  are  no  temporal  correlations  in  the  stream 
of  requests  as  would  be  the  case  under  the  standard  Independence  Reference  Model 
(IRM).  More  precisely,  under  the  IRM  with  popularity  pmf  p  =  (p(  1), . . . , p(N)).  the 
requests  { Rt ,  t  —  0, 1, . . .}  form  a  sequence  of  i.i.d.  A/”-valued  rvs,  each  distributed  ac¬ 
cording  to  the  pmf  p.  Even  in  the  absence  of  temporal  correlations,  locality  of  reference 
is  present,  in  that  the  skewness  of  p  acts  as  an  indicator  of  the  strength  of  locality  of 
reference  under  the  intuition  that  the  more  “balanced”  the  pmf  p ,  the  weaker  the  locality 
of  reference. 

In  a  recent  paper,  Fonseca  et.  al  [34]  introduced  a  notion  of  comparison  based  on  the 
entropy  of  the  popularity  pmfs,  i.e.,  the  pmf  p  is  considered  to  be  less  skewed  (or  more 
balanced)  than  the  pmf  q  whenever  the  entropy  of  p  is  greater  than  the  entropy  of  q. 
Unfortunately,  this  notion  is  not  strong  enough  to  allow  for  results  of  the  forms  (1.2)  and 
(1.3)  to  be  established.  Here,  the  degree  of  skewness  in  the  popularity  pmf  is  captured 
formally  through  the  notion  of  majorization  ( ordering )  [Chapter  2].  This  concept  has 
been  used  previously  in  the  context  of  caching  by  van  den  Berg  and  Towsley  [65].  With 
this  notion,  the  comparison  (1.2)  can  be  recast  as  saying  that  the  miss  rate  (as  a  function 
of  popularity)  belongs  to  the  rich  and  structured  class  of  monotone  functions  associated 
with  majorization,  the  so-called  Schur-convex/concave  functions.  Moreover,  basic  facts 
regarding  majorization  enable  us  to  develop  generic  comparison  results  between  the 
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popularity  pmfs  of  the  input  and  output  streams  [Chapter  6]. 

Equipped  with  the  notion  of  majorization  ordering,  the  folk  theorems  for  the  miss 
rates  and  output  streams  can  be  established  for  a  number  of  policies,  namely  the  optimal 
policy  A0,  the  random  policy  and  the  FIFO  (First- In/First-Out)  policy  [Chapter  6].  These 
positive  results  are  then  extended  to  a  very  large  class  of  replacement  policies,  the  so- 
called  Random  On-demand  Replacement  Algorithms  (RORA)  [Chapter  7]. 

However,  these  folk  theorems  do  not  always  hold  under  two  self-organizing  policies, 
namely  the  FRU  (Feast-Recently-Used)  and  CFIMB  replacement  policies  [Chapter  8]. 
We  first  exhibit  situations  where  under  these  policies,  the  IRM  stream  with  more  skewed 
popularity  pmf  may  have  a  smaller  miss  rate  than  the  IRM  stream  with  less  skewed 
popularity  pmf.  Yet,  when  the  popularity  pmfs  are  Zipf-like  [Section  6.2],  simulations 
show  that  the  comparison  (1.2)  under  these  policies  does  hold.  We  formally  establish 
this  fact  only  in  the  limiting  regime  where  the  skewness  parameter  of  the  Zipf-like  pmf 
is  large,  i.e.,  highly  skewed. 

It  also  happens  that  the  FRU  and  CFIMB  policies  fail  to  reduce  locality  of  refer¬ 
ence  in  that  under  these  policies,  the  input  popularity  pmf  p  (of  R)  is  not  necessarily 
more  skewed  than  the  output  popularity  pmf  p*  (of  i?*).  We  explore  the  issue  through 
counterexamples  which  are  developed  within  some  classes  of  input  popularity  pmfs.  In 
particular,  when  the  input  popularity  pmf  lies  in  the  class  of  Zipf-like  pmfs,  we  iden¬ 
tify  a  condition  involving  the  cache  size  and  the  number  of  cacheable  documents  under 
which  reduction  fails  to  occur  at  large  enough  values  of  the  skewness  parameter  of  the 
input  Zipf-like  pmf.  Under  this  condition,  which  we  expect  to  be  satisfied  in  practice, 
we  show  that  the  output  pmf  p*  may  not  exhibit  less  locality  of  reference  than  the  input 
pmf  p  when  the  latter  has  too  much  of  it  to  begin  with.  Additional  simulations  were 
carried  out  and  suggest  conjectures  as  to  when  FRU  and  CFIMB  policies  indeed  reduce 
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locality  of  reference  with  Zipf-like  input  pmfs.  All  indications  point  to  the  possibility 
that  for  small  enough  cache  sizes,  the  desired  folk  theorem  will  hold. 

1.4.2  Positive  dependence  and  temporal  correlations 

As  mentioned  earlier,  the  catagorical  nature  of  the  requests  { Rt .  t  =  0, 1, . . .}  makes 
it  difficult  to  define  appropriate  notions  of  temporal  correlations.  Even  though  several 
metrics  have  been  proposed,  e.g.,  the  inter-reference  time,  the  working  set  size  and  the 
stack  distance,  none  has  been  found  appropriate  for  formalizing  these  folk  theorems. 

We  take  on  this  issue  by  applying  the  concepts  of  positive  dependence  [Chapter  3] 
to  capture  the  strength  of  temporal  correlations  exhibited  by  streams  of  requests.  Posi¬ 
tive  dependence  has  been  used  previously  in  a  number  of  contexts,  e.g.,  network  traffic 
and  queueing  theory  [8,  9,  66],  and  reliability  theory  [6,  60].  Specifically,  relying  on 
the  notion  of  supermodular  ordering  [Definition  3.4]  which  has  been  used  to  compare 
dependence  structures  in  sequences  of  rvs,  we  define  the  Temporal  Correlations  (TC) 
ordering  [Definition  9.1]  as  a  way  to  compare  streams  of  requests  on  the  basis  of  the 
strength  of  their  temporal  correlations.  This  new  ordering  is  well  suited  for  comparing 
the  relative  strength  of  temporal  correlations  as  we  note  that  request  streams  compara¬ 
ble  in  the  TC  ordering  must  have  the  same  popularity  profiles  (under  the  assumption 
that  they  exist);  in  other  words,  the  TC  ordering  cannot  capture  any  contribution  from 
popularity  toward  locality  of  reference. 

We  apply  the  TC  ordering  to  capture  the  strength  of  temporal  correlations  present 
in  several  Web  request  models  that  are  believed  to  exhibit  such  correlations,  namely  the 
higher-order  Markov  chain  model  (HOMM),  the  partial  Markov  chain  model  (PMM) 
and  the  Least-Recently-Used  stack  model  (LRUSM).  Indeed,  we  demonstrate  that  the 
HOMM  exhibits  temporal  correlations  in  the  sense  that  it  has  stronger  strength  of  tempo- 
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ral  correlations  than  the  IRM  with  the  same  popularity  pmf  in  the  TC  ordering  [Section 
9.2].  This  property  is  shown  to  hold  also  for  the  LRUSM  under  a  reasonable  condition 
on  its  stack  distance  pmf  [Section  9.4].  Lastly,  for  PMM,  we  show  that  the  strength 
of  temporal  correlations  is  indeed  captured  by  the  correlation  parameter  as  expected 
[Section  9.3]. 

With  the  TC  ordering,  we  establish  the  folk  theorem  for  miss  rates  when  the  input 
to  the  cache  is  modeled  according  to  the  PMM  under  certain  assumptions  on  the  cache 
replacement  policies  [Section  9.5.1].  Conjectures  and  simulations  are  offered  as  to  when 
this  folk  theorem  would  hold  under  the  HOMM  [Section  9.5.2]  and  under  the  LRUSM 
[Section  9.5.2].  We  also  investigate  this  folk  theorem  with  general  input  streams  under 
the  so-called  Working  Set  (WS)  algorithm  [Section  10.4]  which  is  a  cache  management 
policy  associated  with  the  working  set  model.  The  result  indicates  that  (1.2)  does  hold 
when  the  cache  holds  only  one  document  in  which  case  the  WS  algorithm  is  identified 
with  any  demand-driven  caching  with  unit  cache  size.  However,  the  folk  theorem  may 
not  hold  in  some  other  situations,  as  shown  by  counterexamples  in  the  class  of  PMM 
request  streams. 

It  is  also  desirable  to  establish  the  folk  theorem  for  output  streams  via  the  TC  or¬ 
dering.  However,  there  are  only  limited  cases  of  interests  as  we  recall  that  the  output 
popularity  pmf  p*  is  not  necessarily  the  same  as  the  input  popularity  pmf  p  and  that 
the  comparison  in  the  TC  ordering  between  the  input  stream  and  the  output  stream  re¬ 
quires  that  both  popularity  pmfs  be  identical.  This  shortcoming  calls  for  further  study 
to  develop  orderings  that  can  compare  the  strength  of  locality  of  reference  contributed 
by  both  components,  namely  popularity  and  temporal  correlations. 
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1.4.3  Locality  of  reference  metrics 

Lastly,  we  investigate  whether  the  comparison  in  the  majorization  ordering  of  two  IRM 
streams  and  the  comparison  in  the  TC  ordering  of  two  request  streams  translate  into  the 
expected  comparisons  for  three  well-established  locality  of  reference  metrics,  namely, 
the  working  set  size,  the  inter-reference  time,  and  the  stack  distance. 

For  the  working  set  size,  the  majorization  ordering  of  two  IRM  streams  implies  the 
(strong)  stochastic  ordering  between  their  working  set  sizes,  while  the  TC  ordering  of 
two  request  streams  only  gives  a  comparison  between  their  average  working  set  sizes. 
In  addition,  both  the  majorization  ordering  and  the  TC  ordering  allow  a  comparison 
of  the  steady  state  inter-reference  times  in  the  convex  ordering.  However,  implications 
of  these  orderings  on  the  stack  distances  are  not  fully  understood  and  require  further 
investigation. 

These  locality  of  reference  metrics  are  sometimes  used  for  cache  dimensioning  and 
cache  performance  evaluation.  Thus,  the  aforementioned  relations  naturally  lead  to  var¬ 
ious  bounds  on  these  performance  metrics.  For  instance,  because  the  IRM  with  uniform 
popularity  pmf  acts  as  a  lower  bound  (in  the  sense  of  majorization  ordering)  for  any  IRM 
stream,  its  corresponding  locality  of  reference  metrics  are  bounds  for  those  of  other  IRM 
streams.  Furthermore,  if  the  request  stream  R  exhibits  temporal  correlations  stronger 
than  that  of  the  IRM  with  similar  popularity  pmf  in  the  sense  of  the  TC  ordering,  then 
the  performance  metrics  associated  with  this  IRM,  which  are  usually  known  or  easier  to 
be  computed,  can  provide  bounds  for  those  of  the  request  stream  R. 
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1.5  Organization 


The  dissertation  is  organized  as  follows:  The  theory  of  majorization  and  its  compan¬ 
ion  notion,  Schur-convexity,  are  summarized  in  Chapter  2.  Basic  definitions  and  facts 
regarding  positive  dependence  and  stochastic  orderings  are  collected  in  Chapter  3. 

In  Chapter  4,  we  introduce  a  simple  framework  of  demand-driven  caching  and  give 
the  definitions  of  miss  rate  and  output  of  a  cache.  We  then  use  the  concept  of  ma¬ 
jorization  ordering  for  comparing  popularity  pmfs  of  IRM  request  streams  in  Chapter  6. 
With  the  majorization  ordering,  we  establish  the  folk  theorems  for  miss  rates  and  out¬ 
put  streams  under  the  random  policy  and  the  policy  Aa.  These  results  are  extended  in 
Chapter  7  to  a  large  class  of  demand-driven  replacement  policies,  the  so-called  Random 
On-demand  Replacement  Algorithm  (RORA).  In  Chapter  8,  we  show  that  the  folk  the¬ 
orems  do  not  hold  in  general  for  two  well-known  self-organizing  policies,  the  LRU  and 
CLIMB  policies,  where  counterexamples  are  established.  Asymptotics  and  conjectures 
under  the  class  of  IRM  streams  with  Zipf-like  popularity  pmf  are  investigated. 

In  Chapter  9,  we  use  the  concepts  of  positive  dependence  and  supermodular  ordering 
to  define  the  TC  ordering  as  a  means  to  compare  strength  of  temporal  correlations. 
This  ordering  is  then  used  to  capture  the  temporal  correlations  present  in  three  request 
models,  namely  HOMM,  PMM  and  LRUSM.  The  folk  theorem  for  miss  rates  of  the 
PMM  is  established  under  certain  assumptions  on  the  caching  policy.  Specific  results 
and  conjectures  on  this  folk  theorem  under  the  HOMM  and  the  LRUSM  are  provided. 

The  working  set  model  is  considered  in  Chapter  10  where  we  demonstrate  how 
the  majorization  ordering  between  IRM  streams  and  the  TC  ordering  between  request 
streams  can  be  translated  into  comparisons  of  the  working  set  sizes.  Next,  under  the 
Working  Set  algorithm,  we  find  that  the  folk  theorems  for  miss  rates  and  output  streams 
do  not  always  hold  for  IRM  input  streams.  For  general  input  models,  the  folk  theorem 
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for  miss  rates  holds  when  the  cache  holds  only  one  document,  but  fails  otherwise. 

Lastly,  in  Chapter  11,  we  show  that  the  majorization  ordering  and  the  TC  ordering 
imply  the  comparison  in  the  convex  ordering  of  the  steady  state  inter-reference  times. 
We  also  investigate  whether  these  orderings  would  lead  to  some  appropriate  compar¬ 
isons  of  the  stack  distances. 
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Chapter  2 


Majorization  and  Schur-convexity 

2.1  Majorization  -  A  primer 

The  concept  of  majorization  [49]  provides  a  powerful  tool  to  formalize  statements  con¬ 
cerning  the  relative  skewness  in  the  components  of  two  vectors,  viz.,  the  components 
(xi, . . . ,  xn)  of  the  vector  x  are  “more  spread  out”  or  “more  balanced”  than  the  com¬ 
ponents  (?/i, . . . ,  yN)  of  the  vector  y:  For  vectors  x  and  y  in  If!  N,  we  say  that  x  is 
majorized  by  y,  and  write  x  -<  y.  whenever  the  conditions 

n  n 

-  X^H>  n  —  1,2, . . . ,  N  —  1  (2.1) 

2=1  2=1 

and 

N  N 

=  Vi  (2-2) 

2=1  2=1 

hold  with  rc[i]  >  X[2]  ’>■■■>  X[n]  and  >  z/[2]  >  •  •  •  >  ]-J[n]  denoting  the  components 
of  x  and  y  arranged  in  decreasing  order,  respectively. 

As  elegantly  demonstrated  in  the  monograph  of  Marshall  and  Olkin  [49],  this  notion 
has  found  widespread  use  in  many  diverse  branches  of  mathematics  and  their  applica¬ 
tions,  viz.  in  computer  databases  [20]  and  storage  [73]. 
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We  begin  with  a  sufficient  condition  for  majorization  which  is  extracted  from  the 
discussion  in  [49,  B.l,  p.  129]. 

Proposition  2.1  Let  x  and  y  be  distinct  elements  of  RA  such  that 

N  N 

=  (2-3) 
i=l  i=  1 

Whenever,  x\  >  X2  >  ...  >  xn,  if  there  exists  some  k  =  1, . . . ,  N  —  1  such  that 
Xi  <  Hi,  i  =  1,  •  •  •  ,k  and  Xi  >  yi,  i  —  k  +  1, . . .  ,  N,  then  the  comparison  x  -<  y  holds. 


The  following  sufficient  condition  for  majorization  will  be  useful  in  the  sequel;  it 
was  already  announced  in  [49,  B.l.b,  p.  129]  without  proof. 

Theorem  2.2  Let  x  and  y  be  distinct  elements  of  1RA  such  that  (2.3)  holds.  Whenever 
Xi  >  x2  >  ■  ■  ■  >  xN  >  0,  and  the  ratios  i  =  1, . . . ,  N,  are  decreasing  in  i,  we  have 

Xi 

the  comparison  x  -<  y. 


Proof.  Under  the  condition  xi  >  0,  i  —  1, . . . ,  N,  we  find  that  (2.3)  can  be  rewritten  as 

£.r;(y,:  -  l)  -  0.  (2.4) 

If  the  ratios  — ,  i  —  1, . , . ,  N,  are  decreasing  in  i,  then  by  virtue  of  (2.4)  there  must  exist 

Xi 

some  k  with  1  <  k  <  N  such  that 

—  -  1  >  0,  i  —  1, . . .  ,k 

Xi 

and 

—  —  1  <  0,  i  —  k  +  1, . . . ,  N. 

Xi 
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In  other  words,  Xi  <  yi  for  i  =  1 ,k  and  yt  <  Xi  for  i  =  k  +  1, . . . ,  N,  and  we 
readily  obtain  the  comparison  x  -<  y  by  applying  Proposition  2.1.  ■ 

With  any  element  of  Rv  such  that  X/=i  ^  0,  we  associate  the  normalized  vector 
x  as  the  element  of  Rv  defined  by 

N 

x  :=  {'52xi)-1(xi,..,,xN)-  (2.5) 

i= 1 

With  this  notation,  we  can  now  present  a  useful  corollary  to  Theorem  2.2. 

Corollary  2.3  Letx  andy  be  distinct  elements  ofR/v  such  thatJ2iLi  Vi  >  0.  Whenever 
X\  >  x2  >  •  •  •  >  xn  >  0,  and  the  ratios  yL,  i  —  1, . . . ,  N,  are  decreasing  in  i,  we  have 
the  comparison  x  -<  y. 


Proof.  Under  the  enforced  assumptions,  we  note  the  inequalities  xi  >  0  and 
Xi  >  x2  >  •  •  •  >  xn  >  0  with  the  ratios  I1,  i  —  1, . . . ,  N,  decreasing  in  i.  Obviously, 

Xi 

J2hxi  =  =  1  and  we  get  the  desired  result  by  applying  Theorem  2.2  to  x  and 

y ■  ■ 


The  following  reformulation  of  Corollary  2.3  is  used  in  the  sequel. 

Lemma  2.4  Let  x  and  y  be  distinct  elements  of  R  A  such  that  Xi  >  0,  i  =  1, . . . ,  N 
andEl1yi  >  0.  If 

—  >  (2.6) 

0C  2  j 

whenever  Xi  >  x3  for  distinct!,  j  =  1 , ,N,  then  the  comparison  x  -<  y  holds. 

Before  giving  a  proof,  we  introduce  the  following  notation:  Let  o  denote  a  permuta¬ 
tion  of  (1, . . . ,  N}.  With  any  element  x  in  RjV,  we  associate  the  permuted  vector  o(x) 
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in  ]Ra  through  the  relation 


cr(ic)  (^er(l))  .  .  .  ,  a^cr^JV))- 

It  is  plain  from  the  definition  of  majorization  that  for  vectors  x  and  y  in  D  !  v,  we  have 
x  -<  y  if  and  only  if  cr(x)  -<  y  for  any  permutation  a  of  {1, . . . ,  N}. 

Proof.  Let  a  denote  a  permutation  of  {1, . . . ,  N}  such  that  xa(\)  >  x a(2)  >  •  •  •  >  xa(N)- 
The  enforced  monotonicity  assumptions  can  be  restated  as 

Vc t(1)  ^  y<r(2)  ^  ^  Vcr(N) 

Xa(l)  X<y(2) 

and  the  desired  result  follows  by  an  easy  application  of  Corollary  2.3  to  the  elements 
a(x)  and  cr(y).  ■ 

One  such  application  of  Lemma  2.4  is  given  in 
Lemma  2.5  For  any  e  >  0,  define  the  N -dimensional  vector  p£  by 

Pe=  (1-  (N-l)£, 

If  e  and  y  satisfy  the  relation  0  <  y  <  £  <  jj,  then  it  holds  thatp£  -<  Pry 


Proof.  As  we  have  in  mind  to  apply  Lemma  2.4,  we  take  x  =  x  =  p£  and  y  =  y  =  p  . 
It  is  plain  that  the  requisite  monotonicity  assumptions  of  Lemma  2.4  hold  when  £  and  q 
satisfy  the  relation  0<77<£<-^.  ■ 
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2.2  Schur-convexity 


Key  to  the  power  of  majorization  is  the  companion  notion  of  monotonicity  associated 
with  it:  An  ]R- valued  function  p  defined  on  a  set  A  of  RiV  is  said  to  be  Schur-convex 
(resp.  Schur-concave)  on  A  if 

<p(x)  <  <P(V)  (resp.  <p(x)  >  p(y)) 

whenever  x  and  y  are  elements  in  A  satisfying  x  -<  y.  If  A  =  RjV,  then  p  is  sim¬ 
ply  said  to  be  Schur-convex  (resp.  Schur-concave).  In  other  words,  Schur-convexity 
(resp.  Schur-concavity)  corresponds  to  monotone  increasingness  (resp.  decreasingness) 
for  majorization  (viewed  as  a  pre-order  on  subsets  of  R^). 

Let  {<jj,  i  =  1, . . . , iV!}  be  a  given  enumeration  of  all  the  N\  permutations  of 
(1 , . . . ,  iV };  this  enumeration  will  be  held  fixed  throughout  this  section.  A  subset  A 
of  RA  is  said  to  be  symmetric  if  for  any  x  in  A,  the  element  cr,  (x)  also  belongs  to  A  for 
each  i  —  1, . . . ,  N\.  Moreover,  for  any  subset  A  of  Rw,  a  mapping  (p  :  A  — >  R  is  said 
to  be  symmetric  if  A  is  symmetric  and  for  any  x  in  A,  we  have  p{oi{x ))  =  p ( x )  for 
each  i  —  1, . . . ,  N\.  If  the  mapping  p  :  A  — >•  R  is  Schur-convex  (resp.  Schur-concave) 
with  symmetric  A,  then  p  is  necessarily  symmetric  since  <7* (a;)  -<  x  -<  (Jiix)  implies 
p(<7i(x))  =  p(x)  for  each  i  =  1, . . . ,  N\. 

In  the  following,  we  have  collected  some  useful  technical  results  concerning  Schur- 
concave  functions.  As  in  [49,  p.  78],  for  each  M  =  1, . . . ,  N,  the  elementary  symmetric 
function  EM: N  :  Ra  — >  R  is  defined  by 

Em,n(x)  :=  xeHN  (2.7) 

with  denoting  the  collection  of  all  unordered  subsets  of  size  M  of  J\f  — 

(1, . . . ,  iV}.  By  convention  we  write  E0  N(x)  =  1  for  all  x  in  R;v .  It  is  well  known 
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[49,  Prop.  F.I.,  p.  78]  that  the  function  Em,n  is  Schur-concave  on  Rv  for  each  M  = 
0, 1, . . . ,  iV. 

We  note  from  [49,  Prop.  C.2,  p.  67]  that  any  mapping  p  :  A  — >  R  which  is  symmet¬ 
ric  and  convex  (resp.  concave)  on  some  convex  symmetric  subset  A  of  D  <  N  is  necessarily 
Schur-convex  (resp.  Schur-concave).  The  following  result  is  due  to  Schur  [49,  F.3,  p. 
80]  and  will  be  key  to  a  number  of  proofs. 


Proposition  2.6  For  each  M  =  1, . . . ,  N,  the  mapping  $m,n  ■  R  v  — >  R  given  by1 

Em,n(x ) 


®m,n{x)  := 


Em-i, n(x)  ’ 


x  G  R 


N 


is  increasing,2  symmetric  and  concave,  hence  increasing  and  Schur-concave. 


Proposition  2.7  Let  A  be  a  convex  symmetric  subset  of  1RA  .  Assume  the  mapping 
cp  :  A  — >  1R  to  be  concave  and  the  mapping  h  :  R,v !  — >  R  to  be  increasing,  symmetric 
and  concave.  Then,  the  mapping  ph  :  A  — >  R  given  by 


ph(x)  =  h((p(ai(x)), ...,  (p(aN\(x))),  x  e  A 


is  symmetric  and  concave,  thus  Schur-concave  on  A. 


Proof.  The  mapping  p/,  is  symmetric  by  virtue  of  the  symmetry  of  h.  The  concavity  of 
ph  can  be  shown  as  follows:  First,  for  i  —  1, . . . ,  N\,  we  set  Pi(x)  =  p(oi(x))  ( x  e  A); 
this  definition  is  well  posed  since  A  is  symmetric.  The  concavity  of  p  implies  that  of 
Pi.  For  arbitrary  x  and  y  in  A,  and  a  in  [0, 1]  (with  d  =  1  —  a),  we  see  that  ax  +  ay  is 

'For  x  in  It 3  such  that  Em-i,n(x)  =  0,  we  have  Em,n(x)  =  0  and  set  <t 'm,n(x)  =  0  by  continuity. 
2Here,  increasing  means  increasing  in  each  argument. 
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also  an  element  of  A,  and  we  obtain 


iph(ax  +  ay)  =  h((pi(ax  +  ay), . . .  ,(pNi(ax  +  ay)) 

>  h(anpi(x)  +  oaf!  (y)t atp  N<(x)  +  aipN!(y)) 

>  ah(<p  i(*), . . . ,  (fN\{.x))  +  ah((fi(y), <fm{y)) 

=  a(ph(x)  +  atph{y). 

The  first  inequality  follows  from  the  concavity  of  each  of  the  mappings  ipi,  i  =  1, . . . ,  N\ 
and  the  increasingness  of  h,  while  the  second  inequality  is  implied  by  the  concavity  of 

h.  u 


With  vectors  t  and  x  in  D!  v,  we  associate  the  element  t  ■  x  of  1RA  defined  by 

tx=  (fiXi,  . .  .,tNxN)- 

With  this  notation,  we  can  state  an  important  consequence  of  Proposition  2.7. 

Proposition  2.8  Assume  the  mapping  iP  :  IR  A  — >  1!  to  be  concave  and  the  mapping 
h  :  IR  Y'  — >  IR  to  be  increasing,  symmetric  and  concave.  For  any  non-zero  vector  t  in 
IR  V,  the  mapping  :  IR  v  — >  IR  defined  by 

ipt(x)  =  h(ip(t  ■  a^x)),. . .  ,ip(t  -am(x))),  x  e  R+ 

is  symmetric  and  concave,  thus  Schur-concave. 


Proof.  If  the  mapping  ^  is  concave,  then  the  mapping  'ipt  :  IR  v  — >  IR  given  by 

'ipt(x)  \=  ip{t  ■  x),  x  G  1R+ 

is  also  concave.  We  obtain  the  desired  result  by  applying  Proposition  2.7  with  A  =  IR 
and  p  =  ^t.  u 
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Chapter  3 


Stochastic  Orderings  and  Positive  Dependence 

3.1  Integral  stochastic  orderings 

In  this  section,  we  summarize  some  important  definitions  and  facts  concerning  the 
stochastic  orderings  of  random  vectors.  Additional  information  can  be  found  in  the 
monographs  by  Muller  and  Stoyan  [52]  and  by  Shaked  and  Shanthikumar  [59].  The 
basic  definition  of  integral  stochastic  orderings  can  be  stated  as  follows: 

Definition  3.1  Let  F  be  a  class  of  Borel  measurable  functions  p  :  Rn  — >  R.  We  say 
that  the  two  Rn -valued  rvs  X  and  Y  satisfy  the  order  relation  X  Y  if 

Eb(X)]  <E[^(Y)]  (3.1) 

for  all  functions  p  in  IF  whenever  the  expectations  exist. 

This  generic  definition  has  been  specialized  in  the  literature.  Here  are  some  impor¬ 
tant  examples. 

Definition  3.2  For  R n -valued  rvs  X  and  Y,  the  rv  X  is  said  to  be  smaller  than  the  rv 
Y  according  to 
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•  the  usual  stochastic  ordering,  written  X  <st  Y,  if  (3.1)  holds  for  all  increasing 
functions  p  :  Rn  — >  R  whenever  the  expectations  exist; 

•  the  convex  ordering,  written  X  <cx  Y ,  if  (3.1)  holds  for  all  convex  functions 
p  :  R”  — >  ]R  whenever  the  expectations  exist; 

•  the  concave  ordering,  written  X  <cv  Y ,  if  (3.1)  holds  for  all  concave  functions 
p  :  Rn  — >  R  whenever  the  expectations  exist; 

•  the  increasing  convex  ordering,  written  X  <icx  Y ,  if  (3. 1)  holds  for  all  increasing 
convex  functions  p  :  Rn  — >  R  whenever  the  expectations  exist;  and 

•  the  increasing  concave  ordering,  written  X  <icv  Y ,  if  (3.1)  holds  for  all  increas¬ 
ing  concave  functions  p  :  Rn  — ►  R  whenever  the  expectations  exist. 

Let  X  and  Y  be  R-valued  rvs.  We  note  from  [59,  p.  3]  that  the  comparison  X  <st  Y 
is  equivalent  to 

p  [x  >  t]  <  P  [Y  >  t] ,  te  R.  (3.2) 

It  is  also  known  [59]  that  if  X  <cx  Y ,  we  have  E  [X]  =  E  [Y]  and  Var(X)  <  Var(Y). 
In  other  words,  X  has  the  same  mean  as  Y  but  less  variability  than  Y.  When  X  <lcx  Y , 
there  exists  an  R-valued  rv  Z  such  that  X  <st  Z  <cx  Y  [48,  Thm.  1],  whence  E  [A"]  < 
E  [Y]  and  we  can  interpret  Y  as  being  greater  than  X  in  both  “size  and  variability.” 
Consequently,  the  orderings  cx  and  icx  are  appropriate  for  comparing  the  variability  of 
rvs.  However,  in  the  case  of  random  vectors,  it  is  also  desirable  to  compare  their  degree 
of  “dependence.”  In  the  next  section,  we  describe  a  stochastic  ordering  which  is  well 
suited  for  comparing  the  dependence  structures  of  random  vectors  and  sequences. 

A  few  words  on  the  notation  in  use:  Two  Rr'- valued  rvs  X  and  Y  are  said  to  be 
equal  in  law  if  they  have  the  same  distribution,  a  fact  we  denote  by  X  =st  Y .  For  two 
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sequences  of  rvs  X  =  {Xn,  n  =  1,2,...}  and  Y  =  {Yn,  n  =  1,  2, . . .},  the  notation 
X  —st  Y  indicates  that  X  and  Y  have  the  same  finite  dimensional  distributions,  i.e., 

(Xi, . . . ,  Xn)  —st  (Y\ , . .  ■ ,  Yn)  for  all  n  =  1,2, _  Lastly,  convergence  in  law  or  in 

distribution  (with  t  going  to  infinity)  is  denoted  by  =^t. 

3.2  Supermodular  ordering 

Several  stochastic  orderings  have  been  found  well  suited  for  comparing  the  dependence 
structures  of  random  vectors.  Here  we  rely  on  the  supermodular  ordering  which  has 
been  used  recently  in  several  queueing  and  reliability  applications  [7,  8,  9,  60,  66].  We 
begin  by  introducing  the  class  of  functions  associated  with  this  ordering. 

Definition  3.3  A  function  p  :  Rn  — >  R  is  said  to  be  supermodular  (sm)  if 

p(x  V  y)  +  <p(x  Ay)  >  <p(x)  +  <p(y),  x,  y  e  !Rn 

where  wesetxM  y  =  (aq  V  yu  . . . ,  xn  V  yn)  and  x  A  y  =  (xi  A  yu  . . . ,  xn  A  yn). 

The  supermodular  ordering  is  the  integral  ordering  associated  with  the  class  of  su¬ 
permodular  functions. 

Definition  3.4  For  R n -valued  rvs  X  and  Y,  the  rv  X  is  said  to  be  smaller  than  the 
rv  Y  according  to  the  supermodular  ordering,  written  X  <sm  Y,  if  (3.1)  holds  for 
all  supermodular  Borel  measurable  functions  p  :  Rn  — >  R  whenever  the  expectations 
exist. 

It  is  a  simple  matter  to  check  [8]  that  for  any  Rn-valued  rvs  X  and  Y,  the  compari¬ 
son  X  <sm  Y  necessarily  implies  the  stochastic  equalities 

Xi=stYi,  i  =  1,  , .  ,  ,n,  (3.3) 
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as  well  as  the  covariance  comparisons 


Cov[Xj,  Xj\  <  Cov[Fj,  Yj],  i,j  =  1,2  , . . .  ,n.  (3.4) 

Thus,  the  comparison  X  <sm  Y  represents  a  possible  formalization  of  the  statement  to 
the  effect  that  “Y  is  more  positively  dependent  than  XT 

The  definition  of  the  supermodular  ordering  can  be  extended  to  sequences  of  rvs  in 
a  natural  way. 

Definition  3.5  We  say  that  the  two  1R -valued  sequences  X  =  {Xn,n  =  1,2,...} 
and  Y  =  {Y„.  n  =  1,2, . . .}  satisfy  the  relation  X  <sm  Y  if  (AR, . . . ,  ARR  <sm 
(Yi, . . . ,  Yn)  for  all  n  =  1,2,.... 

In  what  follows,  we  introduce  several  concepts  of  positive  dependence. 

3.3  Positive  dependence 

Positive  dependence  in  a  collection  of  rvs  can  be  captured  in  several  ways.  The  as¬ 
sociation  of  rvs  is  one  of  the  most  useful  such  characterizations;  it  was  introduced  by 
Esary,  Proschan  and  Walkup  [31]  and  has  proved  useful  in  various  settings  [6,  42]  (and 
references  therein). 

Definition  3.6  The  R" -valued  rv  X  =  (AR, . . . ,  Xn)  is  said  to  be  associated1  if  the 
inequality 

E[/(X)s(X)]  >E[/(X)]E[j(X)] 

holds  for  all  increasing  functions  /,  g  :  R"  — >  R  for  which  the  expectations  exist. 

A  stronger  notion  of  positive  dependence  is  given  by 
'Sometimes,  we  say  that  the  R- valued  rvs  Xi, ,  Xn  are  associated. 
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Definition  3.7  The  IV  - valued  rv  X  =  (Xi, . . . ,  Xn)  is  said  to  be  conditionally  increas¬ 
ing  in  sequence  (CIS)  if  for  each  k  —  1,  2, . . . ,  n  —  1,  the  family  of  conditional  distribu¬ 
tions  {[Xfe+xlXi  =  ,  Xk  =  a;/,]  }  is  stochastically  increasing  in  x  =  (.x, , . . .  ,xk). 

More  precisely,  this  definition  states  that  for  each  k  —  1,  2, . . . ,  n  —  1,  for  x  and  y 
in  with  x  <  y  componentwise,  it  holds  that 

[Xfc+1|(Xl5  ...,Xk)  =  x]  <st  [Xfe+iKXi,  ...,Xk)  =  y] 

where  [Xk+1\{X^  . . . .  Xk)  =  x\  denotes  any  rv  distributed  according  to  the  condi¬ 
tional  distribution  of  Xk+i  given  (X\ , . . .  ,Xk)  =  x  (with  a  similar  interpretation  for 
[Xk+1\(X1,...,Xk)  =  y]). 

We  next  show  how  the  supermodular  ordering  induces  a  notion  of  positive  depen¬ 
dence  but  first,  a  definition: 

Definition  3.8  For  R n-valued  rvs  X  and  X,  we  say  that  X  =  (X1, . . . ,  Xn)  is  an 
independent  version  of  X  =  (Xi, . . . ,  Xn)  if  the  rvs  X  X‘>. ... .  Xn  are  mutually  in¬ 
dependent  with  Xk  =st  Xk,  for  each  k  —  1, . . . ,  n. 

From  the  concept  of  supermodular  ordering,  the  positive  dependence  between  the 

components  X1} _ ,  Xn  of  the  R"-valued  rv  X  can  be  formalized  by  requiring  that  the 

rv  X  be  larger  in  the  supermodular  ordering  than  its  independent  version  X.  This  gives 
rise  to  the  following  notion  of  positive  dependence  [52]: 

Definition  3.9  The  R n -valued  rv  X  =  (X1; . . . ,  Xn )  is  said  to  be  positive  supermodu¬ 
lar  dependent  (PSMD)  if 

X  <sm  X  (3.5) 

where  X  is  the  independent  version  of  X. 
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The  next  proposition  explores  the  relationships  between  the  various  notions  of  posi¬ 
tive  dependence  introduced  thus  far. 

Theorem  3.10  Consider  an  1R n -valued  rv  X  =  (Xu  . . . ,  Xn). 

(a)  If  X  is  CIS,  then  X  is  associated;  and 

(b)  If  X  is  associated,  then  X  is  PSMD. 

Part  (a)  can  be  found  in  the  monograph  by  Barlow  and  Proschan  [6,  Thm.  4.7,  p. 
146]  while  Part  (b)  has  been  established  recently  by  Christofides  and  Vaggelatou  [21, 
Thm.  1],  Earlier,  Meester  and  Shanthikumar  [51,  Thm.  3.8]  have  shown  that  CIS  implies 
PSMD. 

Lastly,  we  naturally  extend  these  definitions  to  sequences  of  rvs  along  the  lines  of 
Definition  3.5. 

Definition  3.11  For  sequences  of  M-valued  rvs  X  =  { Xn ,  n  =  1,2,...}  and  X  = 
{Xn,  n  —  1,  2, . . .},  we  say  that  X  is  an  independent  version  of  X  if  the  rvs  {Xn,  n  = 
1,2,.. .}  are  mutually  independent  with  Xn  =st  Xn  for  all  n  —  1,2,.... 

Definition  3.12  We  say  that  the  1R -valued  sequence  X  =  { Xn ,  n  —  1,  2, . . .}  is  asso¬ 
ciated  (resp.  CIS,  PSMD)  if  for  each  n  =  1,  2, . . .,  the  R n-valued  rv  (AR, . . . ,  Xn)  is 
associated  (resp.  CIS,  PSMD). 
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Chapter  4 


Demand-driven  Caching 


Consider  a  universe  N  of  N  cacheable  documents,  say  J\f  :=  The  system 

is  composed  of  a  server  where  a  copy  of  each  of  these  N  documents  is  available,  and 
of  a  cache  of  size  M  (1  <  M  <  N ).  Documents  are  first  requested  at  the  cache:  If  the 
requested  document  has  a  copy  already  in  cache  (i.e.,  a  hit),  this  copy  is  downloaded 
from  the  cache  by  the  user.  If  the  requested  document  is  not  in  cache  (i.e.,  a  miss),  a 
copy  is  requested  instead  from  the  server  to  be  put  in  the  cache.  If  the  cache  is  already 
full,  then  a  document  already  in  cache  is  evicted  to  make  place  for  the  copy  of  the 
document  just  requested.  The  document  selected  for  eviction  is  determined  through  a 
cache  replacement  or  eviction  policy.1 

We  now  develop  below  a  mathematical  framework  to  address  some  of  the  issues 
discussed  in  this  dissertation.  Additional  details  are  available  in  the  monographs  by 
Aven,  Coffman  and  Kogan  [2]  and  by  Coffman  and  Denning  [24] .  We  begin  with  some 
notation  that  will  be  used  repeatedly:  Let  A*(M;  J\f)  be  the  collection  of  all  unordered 
subsets  of  size  M  of  A/"  —  (1, . . . ,  N},  and  let  A (M;  J\f)  be  the  collection  of  all  ordered 
sequences  of  M  distinct  elements  from  A/”.  We  write  (ii, . . .  ,iM}  (resp.  (ii, . . .  ,?m))  to 
denote  an  element  in  A *(M;  A/”)  (resp.  A (M;  A f)).  For  each  i  —  1, . . . ,  N,  let  A *(M;  A f) 

'We  use  the  terms  interchangeably. 
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(resp.  A i(M]N))  denote  the  set  of  elements  in  A (resp.  A (M;J\f))  which  do 
not  contain  i,  i.e., 

A*(M;A0  :={s  =  {iUr..iM}eA*{M-,jV):  i  £  s} 

and 

A i(M]J\T)  :=  {s  =  iM)  e  A (M;J\f)  :  i  s}. 

4.1  A  simple  framework 

Consecutive  user  requests  are  modeled  by  a  sequence  of  A/”- valued  rvs  R  =  {Rt,  t  = 

0,1,...}.  For  simplicity  we  say  that  request  Rt  occurs  at  time  t  —  0,1, _ Let  St  denote 

the  cache  just  before  time  t  so  that  St  is  a  subset  of  J\f  with  at  most  M  elements.  Also, 
the  decision  to  be  performed  according  to  the  eviction  policy  in  force  is  the  identity  Ut 
of  the  document  in  St  which  needs  to  be  evicted  in  order  to  make  room  for  the  request 
Rt  (if  the  cache  is  already  full). 

Demand-driven  caching  considered  here  is  characterized  by  the  dynamics 

St  if  RteSt 

St+ 1  =  St  +  Rt  if  Rt<£  St, \St\<M  (4.1) 

St-Ut  +  Rt  if  Rt<£  St,  \St\  =  M 

for  all  t  =  0, 1, . . .,  where  \St\  denotes  the  cardinality  of  the  set  St,  and  St  —  Ut  +  Rt 
denotes  the  subset  of  (1, . . . ,  A/”}  obtained  from  St  by  removing  Ut  and  then  adding  Rt 
to  it,  in  that  order.  These  dynamics  reflect  the  following  operational  assumptions:  (i) 
Actions  are  taken  only  at  the  time  requests  are  made,  hence  the  terminology  demand- 
driven  caching;  (ii)  a  requested  document  not  in  cache  is  always  added  to  the  cache  if 
the  cache  is  not  full  at  the  time  of  request;  and  (iii)  eviction  is  mandatory  if  the  request 
Rt  is  not  in  cache  St  and  the  cache  St  is  full,  i.e.,  \St\  —  M. 
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4.2  Web  request  models  and  reduced  dynamics 


Throughout  we  assume  the  following  for  the  request  stream  R  =  {Rt,  t  =  0, 1, . . .}: 
The  popularity  pmf  p  =  (p(l), . . . , p(N ))  of  R  exists  and  is  defined  as  the  non-random 
limits 

l .  i 

p(i)  =  lim  ^2  1  [Rr  =  i]  a.s.,  i  —  1, . . . ,  N.  (4.2) 

T—  1 

To  avoid  uninteresting  situations,  it  is  always  the  case  that 


P(*)>0,  i  =  l,...,N. 


(4.3) 


A  pmf  p  on  (1, . . . ,  N}  satisfying  (4.3)  is  said  to  be  admissible.2 

Under  this  non-triviality  condition  (4.3),  every  document  will  eventually  be  re¬ 
quested  as  we  note  that 

1  t 

Urn  -  ^2  1  [Rr  —  i\  —  p{i)  >  0  a.s. 


under  the  assumption  (4.2).  Thus,  as  we  have  in  mind  to  study  long  term  characteristics 
under  demand-driven  replacement  policies,  there  is  no  loss  of  generality  in  assuming  (as 
we  do  from  now  on)  that  the  cache  is  full,  i.e.,  for  all  t  —  0, 1, . . .,  we  have  \St\  =  M 
and  (4.1)  simplifies  to 


S, 


t+ i 


St  if  Rt  e  St 


St-Ut  +  Rt  if  Rt<£St. 


(4.4) 


A  number  of  request  models  will  be  considered  here,  the  best  known  one  being  the 
Independent  Reference  Model  (IRM).  The  IRM  will  serve  as  the  first  model  for  which 
we  attempt  to  formalize  the  folk  theorems  introduced  in  this  dissertation.  It  is  a  basic 
model  which  is  often  used  for  checking  various  properties  of  caching  systems  [13]. 

Additional  assumptions  on  the  request  streams,  e.g.,  stationarity  and  ergodicity,  will  be  required  in 
some  parts  of  the  dissertation  and  will  be  stated  when  appropriate. 
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Moreover,  recent  results  by  Jelenkovic  and  Radovanovic  [38]  and  by  Sugimoto  and 
Miyoshi  [63]  suggest  some  form  of  insensitivity  of  caching  systems  to  the  statistics  of 
requests.  However,  the  IRM  does  not  possess  any  of  the  correlations  which  have  been 
observed  in  Web  reference  streams,  thus  making  it  less  suitable  for  modeling  streams 
of  requests  with  strong  temporal  correlations.  Some  examples  of  models  displaying 
temporal  correlations  will  be  discussed  later  in  Chapter  9. 

4.3  Cache  states  and  eviction  policies 

The  decisions  {Ut,  t  —  0, 1, . . .}  are  determined  through  an  eviction  policy;  several  ex¬ 
amples  will  be  presented  shortly.  For  most  eviction  policies  considered  in  the  literature, 
as  well  as  here,  the  dynamics  of  the  cache  can  be  characterized  through  the  evolution 
of  suitably  defined  variables  {flt,  t  =  0, 1, . . .}  where  fLt  is  known  as  the  state  of  the 
cache  at  time  t. 

Consider  an  eviction  policy  ir.  The  cache  state  is  specific  to  the  eviction  policy 
and  is  selected  with  the  following  in  mind:  (i)  The  set  St  of  documents  in  the  cache  at 
time  t  can  be  recovered  from  Q,  ;  (ii)  the  cache  state  Qt+I  is  fully  determined  through 
the  knowledge  of  the  triple  (flt,  Rt,Ut)  in  a  way  that  is  compatible  with  the  dynam¬ 
ics  (4.4);  and  (iii)  the  eviction  decision  Ut  at  time  t  can  be  expressed  as  a  function  of 
the  past  (fi0,  R0,  U0, . . . ,  Rt- i,  Ut- 1,  fit,  Rt )  (possibly  through  suitable  random¬ 

ization),  i.e.,  for  each  t  —  0, 1, . . .,  there  exists  a  mapping  nt  such  that 

Ut  =  7rt(Oo,  Ro,  Uq,  . . . ,  Rt.-i,  Ut- 1,  fit,  Rt',  “i)  (4.5) 

where  Et  isarv  taken  independent  ofthepast  (fl0,R0,U0, . . .  ,flt-i,Rt-i,Ut-i,flt,Rt)- 
Collectively,  the  mappings  {Tit,  t  —  0, 1, . . .}  define  the  eviction  policy  ir. 

We  close  this  section  with  some  examples  of  eviction  policies  which  have  been  dis- 
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cussed  in  the  literature  (see  e.g.,  [2,  24]): 

According  to  the  random  policy,  when  the  cache  is  full,  the  document  to  be  evicted 
from  the  cache  is  selected  randomly  according  to  the  uniform  distribution. 

Any  permutation  a  of  (1, . . . ,  N}  induces  an  ordering  of  the  documents  by  consider¬ 
ing  the  documents  <r(l),  cr(2), . . . ,  <j(N)  as  “ordered”  in  decreasing  order.  This  ranking 
of  the  documents  allows  us  to  define  the  eviction  policy  Aa  as  follows:  When  at  time 
t  =  0, 1, . . .,  the  cache  St  is  full  and  the  requested  document  /?,  is  not  in  the  cache,  the 
policy  Aa  prescribes  the  eviction  of  the  document  Ut  given  by 

Ut  =  argmax  (a_1(j)  :  j  G  5))  .  (4.6) 

The  documents  <r(l), . . . ,  a(M  —  1),  once  loaded  in  the  cache,  will  never  be  evicted, 
and  in  the  steady  state,  the  cache  under  the  policy  Aa  will  contain  the  documents 

The  so-called  policy  A0  is  associated  with  the  underlying  popularity  pmf  p  of  the 
request  stream,  and  evicts  the  least  popular  document  in  the  cache,  i.e.,  when  the  re¬ 
placement  is  required  at  time  t  —  0, 1, . . .,  select  Ut  to  be 

Ut  =  argmin  (p(j)  :  j  G  St)  .  (4.7) 

This  policy  A0  coincides  with  the  policy  Aa+  associated  with  the  permutation  a*  of 
(1, . . . ,  N}  which  orders  the  components  of  the  underlying  pmf  p  in  decreasing  order, 
namely  p(<r*(l))  >  p(cr*( 2))  >  . . .  >  p(cr*(A^)). 

Under  the  random  policy  and  the  policies  Aa,  we  can  take  the  cache  state  to  be 
the  (unordered)  set  of  documents  in  the  cache,  i.e.,  the  cache  state  is  an  element  of 
A *(M;  J\f)  and  =  St  for  all  t  —  0, 1, . . .. 

The  First -in/ First -out  (FIFO)  policy  replaces  the  document  which  has  been  in  cache 
for  the  longest  time,  while  the  Least-Recently-Used  (LRU)  policy  evicts  the  least  re¬ 
cently  requested  document  already  in  cache. 
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The  CLIMB  policy  is  a  close  relative  of  the  LRU  policy.  It  ranks  documents  in  cache 
according  to  their  recency  of  access:  If  the  request  document  is  not  in  the  cache,  the 
document  at  the  last  position  (position  M)  is  evicted  and  replaced  by  the  new  document. 
If  the  requested  document  is  in  the  cache  at  position  i,  i  =  2, . . . ,  M,  it  exchanges 
position  with  the  document  at  position  i  —  1.  The  cache  remains  unchanged  if  the 
requested  document  is  in  the  cache  at  position  1. 

The  definition  of  the  FIFO,  LRU  and  CLIMB  policies  necessitates  that  the  cache 
state  be  an  element  of  A(M;  J\f)  with  Qt  being  a  permutation  of  the  elements  in  St  for 
all  t  —  0, 1, . . .. 

4.4  Miss  rate 

A  standard  performance  metric  to  evaluate  and  compare  various  caching  policies  is  the 
miss  rate  of  a  cache.  This  quantity  has  the  interpretation  of  being  the  long-term  fre¬ 
quency  of  the  event  that  the  requested  document  is  not  in  the  cache,  and  therefore  deter¬ 
mines  the  effectiveness  of  a  caching  policy. 

For  a  given  request  stream  R  =  {Rt,  t  =  0, 1 . . .},  the  miss  rate  Mn(R)  under  a 
cache  replacement  policy  n  is  defined  as  the  a.s.  limit 

1  f 

Mn(R)  =  lim  -  ^  1  [Rr$  ST]  a.s.  (4.8) 

T— 1 

(whenever  the  limit  exists)  where  ST  denotes  the  set  of  documents  in  cache  operating 
under  the  replacement  policy  n  at  time  r  when  the  input  to  the  cache  is  the  request  stream 
R.  Almost  sure  convergence  in  (4.8)  (and  elsewhere)  is  taken  under  the  probability 
measure  on  the  sequence  of  rvs  {flt,  Rt,  Ut,  t  —  0, 1, . . .}  induced  by  the  request  stream 
{Rt,  t  —  0, 1, . . .}  through  the  eviction  policy  ir. 

The  existence  of  the  limit  (4.8)  depends  on  the  request  stream  R  and  on  the  cache 
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replacement  policy  7 r.  Even  in  the  case  where  the  limit  (4.8)  exists,  its  expression  is  not 
known  for  general  classes  of  request  streams.  However,  when  the  request  stream  R  is 
the  IRM,  the  limit  (4.8)  exists  under  most  cache  replacement  policies  of  interest.  This 
special  case  will  be  treated  in  Chapter  5. 

4.5  Output 

Under  the  demand-driven  caching  operation  (4.4),  the  output  of  the  cache  is  the  se¬ 
quence  of  requests  that  incur  a  miss,  i.e.,  when  the  incoming  request  cannot  find  the 
desired  document  in  the  cache.  More  precisely,  a  miss  occurs  at  time  t  if  Rt  is  not  in  St. 
Thus,  we  define  recursively  the  time  indices  (z/fc,  k  —  0, 1, . . .}  by 

z/0  =  0;  vk+1  :=  Vk  +  Vk+i,  k  =  0,1, . 


and 


Vk+i  ■=  inf  {£  =  1,  2, . . .  :  R„k+e  SVk+t) 


with  the  convention  r]k+i  =  00  if  either  vk  =  00  or  if  uk  is  finite  but  the  set  of  indices 
entering  the  definition  of  rjk+i  is  empty.  With  5  denoting  an  element  not  in  J\f,  we  define 
the  output  process  R *  =  {Rk,  k  =  1,2,...}  simply  as 

!RUk  if  uk  <  00 

5  if  uk  —  00 

for  each  k  =  1,2, _  The  requests  {R*k,k  =  1,2,...}  are  those  requests  among 

{Rt,t  =  0, 1, . . .}  which  incur  a  miss  and  which  get  forwarded  to  the  server  (or  to 
the  higher  level  cache  in  a  hierarchical  caching  system). 

The  statistics  of  the  output  stream  {R*kl  k  =  1, 2, . . .}  are  determined  by  the  statistics 
of  the  input  stream  {Rt,  t  —  0, 1, . . .}  and  by  the  cache  replacement  policy  n  in  use.  We 
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are  interested  in  evaluating  the  popularity  pmf  p*  =  (p*  (1), . . .  ,p*(N ))  defined  by 


Pl(i)  '■=  l.im  4  it 1  lRl  =  A  a-s ■  (4-9) 

I<^oo  K  k=1 

for  each  i  =  1, . . . ,  N,  whenever  these  limits  exist. 

As  with  the  limit  (4.8)  of  the  miss  rate,  the  existence  and  form  of  the  limits  (4.9) 
are  not  known  for  general  classes  of  input  models.  However,  as  we  shall  see  in  the  next 
chapter,  when  the  input  stream  is  modeled  according  to  the  IRM,  the  limits  (4.9)  exist 
and  admit  simple  expressions  for  most  cache  replacement  policies  of  interest. 
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Chapter  5 


The  Independent  Reference  Model  (IRM) 


The  Independent  Reference  Model  (IRM)  is  a  basic  model  for  Web  reference  streams;  it 
is  commonly  used  to  evaluate  various  properties  of  caching  policies  [13].  We  say  that 
the  request  stream  R  =  {Rt,  t  —  0, 1, . . .}  is  an  IRM  with  popularity  pmf  p  if  the  rvs 
{Rt,  t  —  0, 1, . . .}  are  i.i.d.  rvs  distributed  according  to  the  pmf  p.  In  this  chapter,  we 
show  that  under  the  IRM  with  popularity  pmf  p  and  under  a  particular  cache  replacement 
policy  7 r,  the  limit  (4.8)  for  the  miss  rate  and  the  limits  (4.9)  for  the  output  popularity 
pmf  p *  exist  and  admit  simple  expressions  whenever  the  a.s.  limit 

/4(s;p)  =  lim  ^  ^  1  [SV  =  s]  a.s.  (5.1) 

t-°°  t  T=1 

exists  for  each  element  s  in  A with  ST  being  the  set  of  documents  in  cache  at 
time  r.  We  now  discuss  these  results  for  the  miss  rate  and  for  the  output  popularity  pmf, 
respectively. 

5.1  Miss  rate  under  the  IRM 

Before  stating  the  main  result,  we  note  from  the  definition  of  the  IRM  that  the  requests 
{Rt,  t  =  0, 1, . . .}  are  characterized  solely  by  the  popularity  pmf  p  and  thus  all  IRM 
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streams  with  the  same  popularity  pmf  p  must  produce  the  same  miss  rate  (4.8)  under  a 
given  replacement  policy  n.  Therefore,  it  is  more  appropriate  to  view  the  miss  rate  under 
the  IRM  as  a  function  of  the  popularity  pmf  p  and  denote  the  limit  (4.8)  by  Mn(p)  to 
reflect  this  fact. 

Theorem  5.1  Consider  an  eviction  policy  n  such  that  the  limits  (5.1)  exist  under  the 
IRM  with  popularity  pmf  p.  Then,  the  limit  (4.8)  exists  and  is  given  by 

n  „ 

Mn(p)  =  ^2p{i)  J2  Tl(s-,p)  (5.2) 

i=  1  seA*(Af;A0 

Z  /4(s;p)Z>(*)-  (5-3) 

s£A*(M;JV)  i^s 

Theorem  5.1  is  established  in  the  process  of  proving  Theorem  5.2  in  Section  5.3. 
The  existence  of  the  limits  (5.1)  is  a  mild  assumption  which  is  satisfied  under  all  eviction 
policies  of  interest  considered  here  (and  in  the  literature).  Indeed,  under  the  IRM  with 
popularity  pmf  p,  the  sequence  of  cache  states  {flt,  t  —  0, 1, ...» . .}  usually  form  a  Markov 
chain  over  a  finite  state  space,  and  standard  ergodic  results  for  finite  state  Markov  chains 
readily  yield  the  existence  of  the  limits  (5.1).  This  issue  will  be  briefly  discussed  in  each 
situation  at  the  appropriate  time.  Note  also  that  the  limits  (4.8)  and  (5.1)  under  the  IRM 
are  often  constants  which  are  independent  of  the  initial  cache  state  Q0.  However  this  is 
not  always  the  case  as  we  shall  see  in  the  discussion  of  RORA  policies  [Chapter  7]. 

5.2  Output  under  the  IRM 

In  this  section,  we  establish  the  existence  and  form  of  the  limits  (4.9)  when  the  input  to 
the  cache  is  the  IRM  with  popularity  pmf  p.  We  again  do  so  under  the  assumption  that 
the  a.s.  limit  (5.1)  exists  for  each  s  in  A *(M;  J\f).  The  main  result  is  contained  in 
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Theorem  5.2  Consider  an  eviction  policy  n  such  that  the  limits  (5.1)  exist  under  the 

IRM  with  popularity  pmf  p.  For  each  i  =  1,. _ ,  N,  the  limit  (4.9)  exists  and  is  given 

by 


Pl(i)  =  lim  j;Y,1[R*  =  i] 
A^°o  K  k=1 

p(i)mn(i;p) 

EjLi  pU)m*(j>P) 

where  we  have  set 

(i;p)  ■=  J2  K(S>P)- 

seA*(M;A/”) 


(5.4) 


(5.5) 


A  proof  of  Theorem  5.2  is  given  in  next  section.  Note  that  the  existence  of  the  limits 
(5.1)  implies 

rrin(r,p)  =  J  E  1  IE  =  s]) 

seA*(M-,AT)\  1  T=  1  / 


=  E  1[St  =  s] 

1  r=lseA*(M;AT) 

1  f 

=  lim  -  'S''  1  [i  (£  ST\  a.s. 


(5.6) 


for  each  i  —  1, . . . ,  N,  and  m7r(i;  p)  thus  represents  the  fraction  of  times  that  document 
i  will  not  be  in  the  cache.  This  quantity  is  determined  by  the  popularity  pmf  p  of  the 
IRM  input  and  by  the  eviction  policy  n  in  use. 

Inspection  of  (5.2)  and  (5.5)  reveals  that 


N 


Ep(*)m*(i;p)  =  MM- 


(5.7) 


i= 1 


This  leads  via  (5.4)  to  a  simple  connection  between  the  miss  rate  of  an  eviction  policy 
and  the  pmf  of  its  output  in  the  form 

p(i)mn(i-,p) 


PAV  = 


Mn(p) 


i  =  1, 


N. 


(5.8) 
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Thus,  with  the  IRM  input,  we  can  view  p*(i)  as  the  ratio  of  the  miss  rate  of  the  cache 
when  the  requested  document  is  i  to  the  overall  miss  rate  of  the  cache. 


5.3  Proofs  of  Theorems  5.1  and  5.2 


Key  to  the  proofs  of  both  Theorems  5.1  and  5.2  is  the  following  observation:  For  each 
t  —  0, 1, . . .,  the  rvs  Qt  and  Rt  arc  independent.  Hence,  by  independence  of  rvs  {Rt,  t  = 
0, 1, . . .},  upon  invoking  Rajchman’s  version  of  the  Strong  Law  of  Large  Numbers  [22, 
Thm.  5.1.2.,  p.  103],  we  find 

1  * 

lim  53  1  [5V  —  s]  (1  [Rt  —  i]  —  p(i ))  =  0  a.s.  (5.9) 

t~*°°  t  T= 1 

for  each  s  in  A *(M;  A f)  and  %  —  1, . . . ,  N. 

For  each  t  =  1,  2, . . .,  let  K(t )  denote  the  total  number  of  misses  up  to  time  t. 
Obviously,  we  have 

t  N  t 

Kit)  :=  E  1  [Rt  ?  ST\  =  E  E  1  [it  ST]  1  [Rr  =  i\ .  (5.10) 

T— 1  i= 1  T=  1 

Fix  i  —  1, . . . ,  N.  We  note  that 

K(t)  t 

E 1  IK  =  i]  =  EHi?sr]HRr  =  i] 

k= 1  r= 1 

=  p(i)  1  [*  ^  sr]  (5.n) 

T—  1 

+  Sr]  (!  [Rr  =  *]  -  P(i))  ■ 

T= 1 

It  is  now  plain  from  (5.9)  that 

lim  J  X)  1  [i&Sr]  (1  [Rr  =  i]~  p(i )) 

t^°°  t  T=1 

1  f 

=  53  liin  -  53  1  [ST  =  s]  (1  [Rt  =  i]  —  p(i))  =  0  a.s.  (5.12) 

seA^M-M)^00  *  r=l 
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Next,  combining  (5.6)  and  (5.12),  we  get  via  (5.11)  that 


1 


,lim  t  Y 1  [*  £  -s'r] 1  [Rt  =  *]  =  p(i)  Y  i4(sip)  a-s •  (5-13) 

t — XX)  I  -  - 

r=l  seA*(M;A0 

Using  the  basic  identity  (5.10)  for  each  t  —  1,  2, . . .,  we  conclude  from  (5.13)  that 

1  t  N  (  i  *  \ 

lim  -  1  [Rr  £  ST]  =  ~t  Y  1  [*  £  Sr]  1  [Rr  =  *]  ) 


t— >  oo  £ 


T—  1 


i=l 

TV 


T—  1 


=  &(*)  5]  /4(«;p)  a-s- 

i=l  seA  *(M;A0 

This  last  limit  yields  the  expression  (5.2)  for  the  miss  rate  (4.8). 

To  establish  (5.3),  we  observe  for  each  t  =  1,2,...  that 

£l[flT£Sr]  =  £  £  l[ST  =  s]  (l[iSr£s]-£j>(i) 


T  — 1 


r=l  seA*(M;A/') 

+  Z  E  1  [^r  =  S]  ■ 

r=l  seA*(M;J\f)  \20s 


20s 


It  then  follows  from  (5.9)  that 


lim  \  Y  Y  1  [ST  =  s]  [  1  [RT  &  s]  -  ^2p(i)  ]  =  0  a.s. 

t  >oo  i 


r= 1  sGA*(M;A/*) 


20S 


so  that 


(5.14) 


Hm  1  £  1  [Rt  $  ST]  =  £  (/ta  ]  £  1  [Sr  =  » I 

6  t=1  s<EA*(M;A0  V  6 


T— 1 


I>(*) 

igs 


a.s. 


and  the  expression  (5.3)  is  obtained  under  the  existence  of  the  limits  (5.1).  This  com¬ 
pletes  the  proof  of  Theorem  5.1. 

It  is  now  immediate  that  the  following  limit  exists  a.s.,  and  is  given  by 


K(t) 


lim 


t^ooK(t)  ~ 


£ira  =  *l  = 


lim^oo  \  Et=  1 1  [i&  ST]  1  [RT  =  i 
lim^oc  \  Y!t= 1  1  [Rt  £  <SV] 

p(i)mn(r,p) 


Ei=i  P(j)mn(j\p) 


a.s. 


(5.15) 
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as  we  note  (5.13)  and  (5.14).  The  desired  conclusion  of  Theorem  5.2  is  readily  obtained 
from  (5.15)  once  we  observe  the  convergence  lim^oo  Kit)  =  oo  a.s.  monotonically  so 
that  the  sequence  {K(t),  t  =  1,2, . . .}  a.s.  exhausts  IN,  and  the  a.s.  existence  of  the 
limit  in  (5.15)  implies  the  a.s.  existence  of  the  limit  (4.9)  with  limiting  value  (5.4)-(5.5). 


Chapter  6 


Comparing  Popularity  under  the  Independent  Reference 
Model 


As  we  have  in  mind  to  study  the  strength  of  locality  of  reference  present  in  streams  of 
requests,  we  first  focus  on  how  popularity  contributes  to  locality  of  reference  by  con¬ 
sidering  the  situation  where  there  are  no  temporal  correlations  in  the  stream  of  requests 
as  would  be  the  case  under  the  IRM  with  popularity  pmf  p.  In  this  case,  the  skewness 
in  the  pmf  p  does  act  as  an  indicator  of  the  strength  of  locality  of  reference  present 
in  the  stream,  under  the  intuition  that  the  more  “balanced”  the  pmf  p ,  the  weaker  the 
locality  of  reference.  This  is  best  appreciated  by  considering  the  limiting  cases:  If  p 
is  extremely  unbalanced  with  p  =  (1  —  S,  e)  (with  5  =  (N  —  l)e),  a  reference 
to  document  1  is  likely  to  be  followed  by  a  burst  of  additional  references  to  document 
1  provided  (N  —  l)e  <C  1  —  5.  The  exact  opposite  conclusion  holds  if  the  popularity 
pmf  p  were  uniform,  i.e.,  p(  1)  =  •  •  •  =  p(N)  =  jj,  for  then  the  successive  requests 
{Rt,t  —  0,1,...}  form  a  truly  random  sequence. 

We  capture  the  skewness  in  the  popularity  vector  through  the  concept  of  majoriza- 
tion  introduced  in  Chapter  2.  From  now  on,  the  majorization  comparison  p  -<  q  formal¬ 
izes  the  notion  that  the  IRM  with  popularity  pmf  p  has  less  locality  of  reference  than  the 
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IRM  with  popularity  pmf  q  as  this  comparison  captures  the  fact  that  the  pmf  q  is  more 
skewed  than  the  pmf  p.  Under  the  IRM,  the  folk  theorem  for  the  miss  rate  associated 
with  a  particular  eviction  policy  n  can  be  restated  as  follows:  If  two  IRM  streams  have 
popularity  pmfs  p  and  q  satisfying  p  -<  q.  then  it  holds  that 

MM  <  MM,  (6-1) 

i.e.,  “the  more  skewed  the  popularity  pmf,  the  smaller  the  miss  rate  of  a  cache.”  Simi¬ 
larly,  the  folk  theorem  for  the  output  of  a  cache  under  the  IRM  now  reads  as  the  com¬ 
parison  p*  -<  p  in  that  the  output  popularity  pmf  p*  is  indeed  more  balanced  than  the 
popularity  pmf  p  of  the  IRM  input. 

In  this  chapter,  we  first  discuss  some  basic  comparisons  which  are  consequences  of 
majorization  comparison  between  pmf  vectors.  We  then  formally  establish  the  folk  the¬ 
orems  for  the  miss  rate  and  for  the  output  of  a  cache  under  the  IRM  with  two  well-known 
cache -replacement  policies,  namely,  the  random  policy  and  the  policy  A0.  Results  for 
more  general  policies  are  discussed  in  Chapter  7  for  Random  On-demand  Replacement 
Algorithms,  and  in  Chapter  8  for  the  LRU  and  CLIMB  policies. 

6.1  Entropy  comparison 

Comparison  results  which  are  consequences  of  majorization  ordering  are  essentially 
statements  concerning  the  Schur-concavity  of  certain  functionals.  We  provide  an  easy 
illustration  of  this  idea  to  the  entropy  comparison.  Recall  that  the  entropy  H(p)  of  the 
pmf  p  on  Af  is  defined  by 

N 

Hip)  ■■=  ~  53 p(*)  loS2P(0  (6-2) 

i=  1 

with  the  convention  t  log2  t  —  0  for  t  =  0.  It  is  known  that  the  larger  the  entropy  II ip), 
the  more  balanced  the  pmf  p.  This  concept  has  been  previously  used  by  Fonseca  et  al. 
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[34]  to  capture  the  strength  of  locality  of  reference  exhibited  through  the  popularity  pmf 
of  the  request  stream. 

By  a  classical  result  of  Schur  [49,  C.l,  p.  64]  the  mapping  x  — >  —  J2^=i  x%  log2  x%  is 
a  Schur-concave  function  on  R+ .  This  leads  readily  to  the  following  well-known  result 
[49,  D.l,  p.  71]. 

Proposition  6.1  For  pmfs  p  and  q  on  A f,  it  holds  that 

H{q )  <  H(p)  (6.3) 

whenever  p  -<  q. 

Thus,  majorization  provides  a  stronger  notion  for  comparing  the  imbalance  in  the  com¬ 
ponents  of  pmfs  than  the  entropy-based  comparison  (6.3)  proposed  by  Fonseca  et  al. 
[34], 

6.2  Zipf-like  distributions 

It  has  been  observed  in  a  number  of  studies  that  the  popularity  distribution  of  objects 
in  request  streams  at  Web  caches  is  highly  skewed.  In  [1]  a  good  fit  was  provided  by 
the  Zipf  distribution  according  to  which  the  popularity  of  the  ith  most  popular  object  is 
inversely  proportional  to  its  rank,  namely  1/i. 

In  more  recent  studies  [13,  39],  “Zipf-like”  distributions1  were  found  more  appropri¬ 
ate;  see  [13]  (and  references  therein)  for  an  excellent  summary.  Such  distributions  form 
a  one-parameter  family.  In  our  set-up,  for  a  >  0,  we  say  that  the  popularity  distribution 
p  of  the  AT- valued  rvs  {Rt,  t  —  0, 1, . . .}  is  Zipf-like  with  parameter  a  if 

_ P_^p_ctwy  i  =  1-"-Ar  <6-4) 

'Such  distributions  are  sometimes  called  generalized  Zipf  distributions. 


42 


with 


N 

Ca{N)  :=E  r°  (6.5) 

i= 1 

The  pmf  (6.4)  will  be  denoted  by  pa.  It  is  always  the  case  that 

Pa(l)  >Pa(2)  >  ...  >Pa(N).  (6.6) 

The  case  a  =  1  corresponds  to  the  standard  Zipf  distribution  and  the  value  of  a  was 
typically  found  to  be  in  the  range  0.64  —  0.83  [13]. 

Zipf-like  pmfs  are  skewed  towards  the  most  popular  objects.  As  a  — >  0,  the  Zipf- 
like  pmf  approaches  the  uniform  distribution  u  while  as  a  — >  oc,  it  degenerates  to  the 
pmf  (1,0,...,  0).  Extrapolating  between  these  extreme  cases,  we  expect  the  parameter 
a  of  Zipf-like  pmfs  (6.4)-(6.5)  to  measure  the  strength  of  skewness,  with  the  larger  a, 
the  more  skewed  the  pmf  pa.  The  next  result  shows  that  majorization  indeed  captures 
this  fact,  and  so  it  is  warranted  to  call  a  the  skewness  parameter  of  the  Zipf-like  pmf. 

Lemma  6.2  For  0  <  a  <  (3,  it  holds  thatpa  -<  p^. 

Lemma  6.2  can  already  be  found  in  [49,  B.2.b,  p.  130]  and  is  an  easy  by-product 
of  Lemma  2.4.  Zipf-like  distributions  will  be  used  in  the  discussion  of  the  LRU  and 
CLIMB  policies  in  Chapter  8. 

6.3  Comparing  input  and  output 

In  the  following  two  sections,  we  establish  basic  comparison  results  which  provide  the 
first  step  toward  formalizing  the  folk  theorem  for  the  output  of  a  cache.  We  begin  with 
a  comparison  between  the  input  popularity  pmf  and  the  output  popularity  pmf  for  a 
general  caching  policy. 
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Theorem  6.3  Consider  an  eviction  policy  n  such  that  the  limits  (5.1)  exist  under  the 
IRM  with  popularity  pmf  p. 

(i)  Ifmn(i ;  p)  <  mn(j;p )  whenever  p(i)  <  p(j )  for  distinct  i,  j  —  1, . . . ,  N,  then  it 
holds  thatp  -<  p*; 

(ii)  If  mn(i;p)  >  rn^ (j ;  p)  whenever  p(i)mn(i;p)  <  p(j)'rn7r(j ]  p)  for  distinct 

—  then  it  holds  thatp *  -<  p  provided  mn(f,  p)  >  0  for  each  i  —  1, . . . ,  N. 


Proof.  Under  the  enforced  assumptions,  both  claims  are  simple  consequences  of 
Lemma  2.4:  For  Claim  (i),  we  use  x  =  p  and  y  given  by  yt  =  p(i)mn(i;p),  i  = 
1, . . . ,  N.  Note  that  x  =  p  while  y  =  p(,  and  that  the  monotonicity  assumptions  hold. 

For  Claim  (ii),  we  take  y  =  p  and  x  given  by  Xi  =  p(i)mn(i ;  p),  i  —  1, . . . ,  N.  This 
time,  we  have  x  =  p*  while  y  =  p.  and  the  requisite  monotonicity  assumptions  hold. 


Theorem  6.3  suggests  the  following  definitions:  We  say  that  the  caching  algorithm 
7 r  is  had  if  it  has  the  property  that  the  fraction  of  time  that  a  document  is  not  in  cache 
increases  as  its  popularity  increases,  i.e.,  for  every  admissible  pmf  p.  it  holds  that 
ran(i-,p)  <  mn(j-,p)  whenever  p(i)  <  p(j)  for  distinct  i,j  =  1  For  a  bad 

caching  algorithm.  Claim  (i)  states  that  the  popularity  pmf  of  the  output  is  more  skewed 
than  the  popularity  pmf  of  the  input,  or  equivalently  that  the  output  stream  displays 
stronger  locality  of  reference  than  the  input  stream. 

The  assumptions  for  Claim  (ii)  ensure  that  mn(i]p)  <  rn^ (j ; p)  and  p(j)  <  p(i) 
occur  simultaneously  for  distinct  —  This  leads  to  defining  a  caching  algo¬ 

rithm  7r  as  good  if  for  every  admissible  pmf  p,  we  have  rnn(i:  p)  <  mn(j\p)  whenever 
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p(j)  <  p(i )  for  distinct  i,j  =  1 ,N.  Thus,  a  caching  policy  which  satisfies  the 
assumptions  of  Claim  (ii)  is  necessarily  a  good  policy.  However,  as  we  shall  see  in  the 
case  of  the  LRU  and  CLIMB  policies  [Chapter  8],  this  by  itself  is  not  sufficient  to  ensure 
that  the  output  popularity  pmf  is  more  balanced  than  the  input  popularity  pmf. 


6.4  A  useful  comparison 

Repeatedly  we  will  encounter  output  pmfs  which  assume  the  generic  form  used  in  The¬ 
orem  6.4  below. 


Theorem  6.4  Let  p  be  an  admissible  pmf  on  J\f,  and  for  each  i  =  1 ,N,  define  the 
(N  —  1)  -dimensional  vector 

PW  :=  (p(l),  •  ■  ■  ,p(i  —  1  ),p(i  +  !),•••  ,p(N)).  (6.7) 


For  each  M  —  1,  2, . . . ,  N  —  1,  the  pmf  p*M  on  J\f  defined  by 


P*M(i) 


Ef=i  pU)Em,n-i(pM)'  ^ 


(6.8) 


satisfies  the  comparison  p\r  -<  p  where  the  elementary  symmetric  function  Em,n- i  : 
]RjV_1  — >  1R  is  defined  at  (2.7). 


Proof.  Fix  distinct  i ,  j  =  1, ....  A"  and  define  the  (N  —  2) -dimensional  vector  p{l'j) 
obtained  from  the  pmf  p  by  deleting  the  components  associated  with  documents  i  and 
j.  With  this  notation,  we  find 

Em,n-i(p{i))  -  Em,n-i(pU)) 

=  53  p^i)  ■  ■  ■  P^m)  -  p(h)  ■  ■  ■  p(iM) 

ssA*  seA^(M;AT) 
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(6.9) 


=  X  p(*i)  ■■■pM  -  X  p(*i)---p(*m) 

s£A?(M-,N):  j£s  s£A*(M;Af):  *Ss 

=  (p(i)  -  p(i))  em-1jN-2 (p(X- 


On  the  other  hand,  we  also  have 


p(i)EM,N-i(pM )  - p(j)EM,N-i(p 


uh 


( 


=  P{i)  (  X  p(*i)-"P(*m)  ]  -  p(j) 

yseA*(M;Af) 


X  p(*i)---p(*m) 

\s£A*(M;Af) 

\  ( 

=  pW  |  X  p(*i)"-p(*m)  -  p(j)  x  p(*i)---p(*m) 

s£A*  (M;N):  /  ysSAJ (M;N):  i£s 

=  (p(i)  -  p(j))  Em,n-2(p(ij))-  (6.10) 


As  we  have  in  mind  to  apply  Lemma  2.4,  we  take  y  =  p  and  x  given  by  xt  = 
p(i)EM)N_i(p^),  i  =  whence  x  =  p*M  and  y  =  p.  For  distinct  i,j  = 

1, . . . ,  N,  we  find  from  (6.9)  and  (6.10)  that 

—  =  (p(j)  ~  P(i))  Em-i, n-2(pm)  <  0 

Vi  Vj 

whenever 

Xi-Xj  =  (jp{i)  -  p{j))  Em,n-2{p{i3))  >  0. 

The  assumptions  of  Lemma  2.4  are  satisfied  and  the  comparison  p*M  -<  p  follows.  ■ 


6.5  The  random  policy 

In  the  last  two  sections,  we  formalize  the  folk  theorems  under  the  IRM  for  the  miss  rate 
and  the  output  of  a  cache  under  the  random  policy  and  the  policy  Aa,  respectively. 

According  to  the  random  policy,  when  the  cache  is  full,  the  document  to  be  evicted 
from  the  cache  is  selected  randomly  according  to  the  uniform  distribution.  When  the 
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input  to  the  cache  is  the  IRM  with  popularity  pmf  p.  the  cache  states  { ,5'/ ,  t  =  0,1,...} 
form  a  stationary  ergodic  Markov  chain  over  the  finite  state  space  A [2,  Thm. 
11,  p.  132].  Its  stationary  distribution  is  given  by 

/4and (S'iP)  =  EM,N{p)~lp(il)  ■  ■  ■  P^m)  (6.11) 


for  every  s  =  { / , , . . . ,  iM}  in  A*(M;  A/”)  with  normalizing  constant  EM  N(p)  defined  at 
(2.7). 

6.5.1  The  miss  rate  under  the  random  policy 


Under  the  IRM  with  popularity  pmf  p.  the  corresponding  miss  rate  is  obtained  from 
(5.3)  and  (6.11)  (see  also  [2,  Thm.  11,  p.  132])  as 

'  '  'P((m)  (l  -  Ejfciu 


Mi 


Rand 


(p)  = 


E{j1,...,iM}eA*(M;AT)  P(*l)  '  '  'P{}m) 

That  (6.1)  indeed  holds  for  the  random  policy  is  contained  in 


(6.12) 


Theorem  6.5  For  admissible  pmfs  p  and  q  on  A f,  it  holds  that 

A^Rand  (<?)  <  M^nd  (p) 


(6.13) 


whenever  p  -<  q. 


Proof.  First,  we  note  that 

P(*i)  •  •  -P{hi)  =  EM(p).  (6.14) 

It  is  also  a  simple  matter  to  see  that 

M 

Y,  p(u)"-:p((m)(i-^p(4)) 

iM}eA*(M^0  k= 1 
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I]  P(h)  ■  ■  ■  p{im)  ■  Y  P(«) 

«m}G  A*  (M;A/") 

=  (M+l)  ^  p(*l)-"P(*M+l) 

{u,...,jM+i}eA*(M+l;A/') 

=  (M+1)£?m+i(p).  (6.15) 

Combining  (6.14)  and  (6.15)  through  (6.12),  we  get 

MRand(p)  =  (• M  +  l)^+;(^,  (6.16) 

Em{p) 

and  the  miss  rate  MRanci(p)  is  Schur-concave  in  p  by  Proposition  2.6  .  ■ 


Under  the  IRM,  it  is  well  known  [2,  p.  132]  that  the  FIFO  policy  yields  the  same 
miss  rate  as  the  random  policy,  so  that  Theorem  6.5  holds  for  the  FIFO  policy  as  well. 

In  the  special  case  M  —  1,  any  demand-driven  policy  reduces  to  the  policy  that  evicts 
the  only  document  in  cache  if  the  requested  document  is  not  in  cache.  Specializing  the 
results  for  the  random  policy,  Theorem  6.5  immediately  leads  to 

Corollary  6.6  With  M  —  1,  for  admissible  pmfs  p  and  q,  it  holds  that 

Mn(q)  <  Mtt(p) 

whenever  p  -<  q  under  any  demand-driven  replacement  policy  n. 


6.5.2  The  output  under  the  random  policy 

As  we  report  (6.1 1)  into  (5.5),  we  readily  conclude  that 


^Rand  (f  P) 


Em,n{p)  1  Y  p(*i) 

seA*(M;A0 


Em,n-i{p{i:>) 
Em,n(p ) 


(6.17) 
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where  p 6)  is  the  (N  —  1) -dimensional  vector  (6.7)  obtained  from  the  pmf  p  by  delet¬ 
ing  the  component  associated  with  document  i.  Consequently,  (5.4)  yields  the  output 
popularity  distribution  as 


PRandW  — 


p(i)EM,N-i(p{l) ) 


i  =  1, . . .  ,N 


(6.18) 


and  Theorem  6.4  immediately  implies 


Theorem  6.7  Under  the  random  policy,  it  holds  that  pf{,in<i  P- 


As  in  the  case  of  miss  rate,  for  the  special  case  M  =  1,  by  specializing  the  results 
for  the  random  policy,  the  output  pmf  is  given  by 

p(i){l-p(i)) 


P  (V  =  ^ N 


TUP(m-p(j)) 


(6.19) 


and  Theorem  6.7  readily  yields 


Corollary  6.8  With  M  —  1,  under  any  demand-driven  replacement  policy  it,  the  popu¬ 
larity  pmf  p*  of  the  output  is  the  pmf  p*  given  at  (6.19)  with  p*  A  p. 


6.6  The  policy  Aa 

Let  cr  denote  a  permutation  of  TV}  which  is  held  fixed  throughout  this  section. 

Such  a  permutation  can  be  used  to  induce  an  ordering  of  the  documents  by  consider¬ 
ing  that  the  documents  cr(l),  cr(2), . . . ,  o(N)  are  “ordered”  in  decreasing  order.  With 
this  ranking  of  the  documents,  the  policy  Aa  can  be  defined  as  in  Section  4.3  with  the 
eviction  rule  (4.6). 
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6.6.1  Cache  steady  state  under  the  policy  Aa 

Under  (4.3),  every  document  is  eventually  requested  with  probability  one,  so  that  for 
sufficiently  large  time  t,  the  cache  St  under  the  replacement  policy  Aa  is  of  the  form 

St:—Y  +  Yta  (6.20) 

with 

S  :=  {(x(l),  a(2), . . . ,  o(M  —  1)}  (6.21) 

and 

Y°  G  Ec  =  (cr(M), . . . ,  cr(./V)}.  (6.22) 

As  explained  earlier,  there  is  then  no  loss  of  generality  in  assuming  that  the  cache  is 
indeed  of  the  form  (6.20)-(6.22),  in  which  case  the  cache  state  St  is  determined  com¬ 
pletely  by  Yta.  Under  the  IRM,  the  rvs  {Yta,  t  =  0, 1, . . .}  form  a  stationary  ergodic 
Markov  chain  over  the  finite  state  space  Ec  with  stationary  distribution  {7ia(y),y  e  Ec} 
described  in  the  following  lemma. 

Lemma  6.9  The  limits 


lim  P  [Yta  =  y,Rt  =  x]=  7 Ta(y)p(x%  (x,  y)  G  U  x  Sc 

t — >oo 


exist  with 


n*(y)  =  lim  P  [Yta  =  y}=  P^y\ 
^°°  Ex^p{x) 


(6.23) 


The  proof  of  Lemma  6.9  is  omitted  as  it  mimics  the  derivation  of  a  similar  result  for 
the  policy  A0  [24,  Thm.  6.3,  p.  268].  Note  that  (6.23)  defines  a  pmf  7ra  on  Ec,  which  is 
simply  the  conditional  pmf  induced  on  Yc  by  the  pmf  p. 
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6.6.2  The  miss  rate  under  the  policy  Aa 


Under  the  IRM  with  popularity  pmf  p.  it  follows  from  Lemma  6.9  and  the  expression 
(5.3)  that  the  miss  rate  under  the  policy  Aa  is  given  [24,  Thm.  6.4,  p.  269]  by 


N 


mm  =  £  p(M)  -  5mP(<tW)2 


(6.24) 


i=M  E  i=Mp{a{i)) 

From  the  expression  (6.24),  it  is  not  hard  to  see  that  the  folk  theorem  (6.1)  for  miss  rates 
under  the  policy  Aa  does  not  hold  in  general.  However,  it  does  hold  under  a  well-known 
instance  of  the  policy  Aa,  the  policy  A0,  defined  earlier  in  Section  4.3.  This  policy  A0  is 
simply  the  policy  Aa*  where  the  permutation  cr*  of  { 1, . . . ,  Ar}  orders  the  components  of 
the  underlying  pmf  p  in  decreasing  order,  i.e.,  p(cr*(l))  >  p(cr*(2))  >  . . .  >  p(cr*(N)). 
The  analog  of  Theorem  6.5  for  the  policy  A0  is  given  in 


Theorem  6.10  For  admissible  pmfs  p  and  q  on  A f,  it  holds  that 


MaM)  <  MAo  (p) 


(6.25) 


whenever  p  -<  q. 


Proof.  The  policy  A0  is  known  [2,  24]  to  minimize  the  miss  rate  for  the  IRM  amongst 
a  large  class  of  demand-driven  policies,  including  the  policies  (4.6).  In  particular,  we 
have 

MAo ip)  =  .  min  Ma.[p)  (6.26) 

1=1 

where  {<7,;,  i  =  1, . . . ,  N\}  is  a  collection  of  all  permutations  of  (1, . . . ,  iV}.  Further¬ 
more,  for  any  permutation  cr  of  (1, . . . ,  N},  we  can  rewrite  (6.24)  as 

(  ,  (y.ILmpW)))2  -EiLuPfrii))2 

A - v-iv7  -7-72U - 
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(6.27) 


E?=M  E}=mPH0)pK7)) 

2- E-ijt-ajp )) 

£i(£  ■  o-(p)) 

=  2<f>2(£  •  <r(p)) 

where  the  element  t  of  R+  is  specified  by  t\  =  . . .  =  £m-i  =  0  and  =  . . .  =  tN  =  1. 

The  mapping  h  :  1RA!  — >  Di  :  y  — >  min  (yi, . . .  ,ym)  is  clearly  increasing,  sym¬ 
metric  and  concave,  while  the  mapping  <f>2  is  concave  on  D  <  v  by  Proposition  2.6.  Com¬ 
bining  these  facts  with  (6.26)  and  (6.27),  we  conclude  by  Proposition  2.8  that  the  miss 
rate  functional  under  the  policy  A0  is  indeed  Schur-concave  in  the  pmf  vector  and  the 
desired  result  follows.  ■ 


Without  surprise,  Corollary  6.6  also  follows  from  Theorem  6.10  (with  M  =  1). 


6.6.3  The  output  under  the  policy  Aa 

From  the  expression  of  {nC7(y),y  G  Ec}  provided  in  Lemma  6.9,  we  obtain 


ma(i-,p )  = 


0 


if  i  G  E 


1  —  7r<r(i)  if  i  E 

and  Theorem  5.2  yields  the  output  popularity  distribution  p *  as 


Pi  (i)  = 


0 


if  i  e  E 
if?;  £  E. 


(6.28) 


Since  p*(i)  =  0  whenever  i  belongs  to  E,  it  is  more  natural  to  seek  a  comparison 
between  p *  (viewed  as  a  pmf  on  Ec)  and  the  conditional  pmf  na. 

Theorem  6.11  Under  the  policy  A„,  it  holds  thatp *  -<  7ra. 
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Proof.  We  rewrite  p *  in  (6.28)  as  a  function  of  7r„  by  dividing  its  numerator  and 
denominator  by  E^ePC/)-  This  yields 


pAv  = 


Kv(i){  1  -  na(i)) 


i  <£  E. 


E^eM/X1  -  tt<7 (j)Y 

With  Lemma  2.4  in  mind,  we  take  x  and  y  to  be  the  elements  of  R'V~M+1  given  by 
y  =  7ra  and  Xi  =  —  na(i)),  i  E,  in  which  case 


-  =  (1-7T  (6.29) 

Xi 

Pick  distinct  i  and  j  not  in  E.  From  (6.29),  we  see  that  ^-  >  if  and  only  if 
na(i)  >  na(j),  and  the  assumptions  of  Lemma  2.4  will  hold  if  we  can  show  that  Xi  >  Xj 
whenever  n a(i)  >  7ia (j ) .  The  analysis  proceeds  along  two  cases: 

Case  (a)  -  Assume  7r a(i)  <1/2.  With  1/2  >  7Ta(i)  >  7r a(j),  we  find 


Xi  =  7ra(i)(l  -  7Ta(i))  >  7Ta(j)(l  -  7 Ta(j))  =  Xj 

by  the  increasing  monotonicity  of  the  mapping  p  —>  p{  1  —  p)  on  the  interval  [0,  |], 

Case  (b)  -  Assume  ‘Ka{i)  >  1/2,  in  which  case  1/2  >  1  —  na(i)  >  n a(j)  since 
Efc^s  "Ak)  =  1-  We  readily  arrive  at  the  conclusion  Xi  >  x3  by  applying  the  argument 
in  Case  (a)  to  1  —  ixAA  and  7r a(j). 

The  assumptions  of  Lemma  2.4  are  satisfied  and  we  get  the  desired  result  with 
x  =  p*a  and  y  =  na.  ■ 


Corollary  6.8  is  also  obtained  from  Theorem  6.11  (with  M  =  1)  as  expected. 


53 


Chapter  7 


Random  On-demand  Replacement  Algorithms  (RORA) 


We  now  introduce  a  large  class  of  demand-driven  eviction  policies  called  Random  On- 
demand  Replacement  Algorithms  (RORA),  and  show  that  the  folk  theorems  for  the  miss 
rate  and  the  output  of  a  cache  hold  under  this  class  of  policies  when  the  input  to  the 
cache  is  the  IRM.  This  class  of  policies  generalizes  many  well-known  caching  policies, 
e.g.,  the  random  and  FIFO  policies,  as  well  as  the  optimal  policy  A0.  Moreover,  the 
Partially  Preloaded  Random  Replacement  Algorithms  proposed  by  Gelenbe  [35]  form  a 
subclass  of  RORAs. 


7.1  Defining  RORAs 

A  RORA  policy  follows  the  demand-driven  caching  rule  (4.4)  (under  the  customary 
assumption  that  the  cache  is  initially  full)  and  is  characterized  by  an  eviction/insertion 
pmf  r  on  {1, ... ,  M}  x  {1, . . . ,  M}  which  we  organize  as  the  M  x  M  matrix  r  =  ( rk^ ), 
i.e.,  for  each  k,£  =  1, . . . ,  M,  we  have  ru  >  0  and  YX=\  ^e=i  rkt  =  1-  The  RORA 
associated  with  the  pmf  matrix  r  is  denoted  RORA(r),  and  often  referred  to  as  the 
RORA(r)  policy. 

We  select  the  cache  state  flt  at  time  t  to  be  an  element  (ii, . . . ,  iM)  of  A (M;  A f)  with 
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the  understanding  that  document  ik  is  in  cache  at  position  k  =  1, . . . ,  M,  at  time  t.  The 
RORA(r)  policy  implements  the  following  eviction  rule:  Introduce  a  sequence  of  i.i.d. 
rvs  {(Xu  Yt),  t  =  0, 1, . . .}  taking  values  in  {1, ... ,  M}  x  {1 . . . ,  M}  with  common 
pmf  r,  i.e.,  for  each  t  —  0, 1, . . we  have 

P  [(Xt,  Yt)  =  (k,  £)]  =  rH,  k,£  =  l,...,M. 

The  sequences  of  rvs  {(Xt,Yt),  t  =  0, 1, . . .}  and  {Rt,  t  =  0, 1, . . .}  are  assumed 
mutually  independent.  The  document  Ut  to  be  evicted  at  time  t  is  given  by 

Ut  =  l[Rt£St]  iXt. 


We  have  Ut  —  0  whenever  the  requested  document  is  in  the  cache  (i.e.,  Rt  G  St),  in  line 
with  the  convention  that  no  replacement  occurs  and  the  cache  state  remains  unchanged, 
i.e.,  flj-i-i 

Next,  if  the  requested  document  is  not  in  the  cache  (i.e.,  Rt  ^  St)  and  (Xt,  Yt)  = 
( k ,  £),  then  Ut  =  4,  i.e.,  the  document  at  position  k  is  evicted,  and  the  new  document  is 
inserted  in  the  cache  at  position  T  If  k  <  £,  the  documents  ik+i,  ■  ■  ■ ,  it  are  shifted  down 
to  position  k,  k  +  1  1  (in  that  order)  while  if  k  >  £,  the  documents  it, . . .  ,ik- 1 

are  shifted  up  to  position  £  +  1, . . . ,  k  (in  that  order).  When  k  =  £,  the  new  document 
simply  replaces  the  evicted  document  at  position  k. 

Observe  that  the  document  initially  at  position  i  in  the  cache  will  never  be  replaced 


rke  =  0  for  { 


if 

( 

all  k  —  1, . . . ,  i  and  £  —  i, . . . ,  M 

and  (7.1) 

all  £  =  1, . . . ,  i  and  k  —  i, . . .  ,M. 

If  we  use  row  i  and  column  i  to  partition  the  matrix  r  into  four  blocks,  then  condition 

(7.1)  expresses  the  fact  that  the  entries  in  the  northwest  and  southeast  corners1  all  vanish 


'With  the  understanding  that  the  position  of  rn  is  at  the  lower  left  corner  of  the  matrix  r. 
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(including  row  i  and  column  i).  Let  Sr  denote  the  set  of  positions  in  the  cache  with  the 
property  that  any  document  initially  put  there  will  never  be  evicted  during  the  operation 
of  the  cache,  i.e., 


£r  :=  {i  =  1, . . . ,  M  :  Eqn.  (7.1)  holds  at  i }.  (7.2) 

Under  the  IRM  with  popularity  pmf  p ,  the  cache  states  { f > /  ■  t  =  0, 1, . . .}  form  a 
Markov  chain  on  the  state  space  A  The  ergodic  properties  of  this  chain  are 

determined  by  whether  the  set  Er  is  empty  or  not.  This  is  done  in  Lemmas  7.1  and  7.2 
in  the  next  two  sections.  These  basic  results  are  established  in  Appendix  A. 

Throughout  the  discussion  below  we  always  assume  that  the  cache  size  M  and  the 
number  of  cacheable  documents  N  satisfy  M  +  1  <  N.  We  do  so  in  order  to  avoid 
technical  cases  of  limited  interest.2  In  addition,  the  input  to  the  cache  is  assumed  to  be 
the  IRM. 

7.1.1  Case  1 

The  set  Sr  is  empty,  so  that  every  document  in  cache  is  eventually  replaced,  i.e.,  for 
each  i  =  1, . . . ,  M,  there  exists  a  pair  k,£  (possibly  depending  on  i)  with  either  1  < 

k  <  i  <  £  <  M  or  l  <  £  <i <k  <  M  such  that 

ru  >  0. 

Here  are  some  well-known  policies  which  fall  in  this  case:  The  random  policy  corre¬ 
sponds  to  RORA(r)  with  r  given  by  rkk  =  jj  for  each  k  —  1, . . . ,  M.  The  FIFO  policy 
also  belongs  to  RORA  with  two  possibilities  for  r,  namely  rq m  =  1  or  =  1.  The 
first  (resp.  second)  choice  corresponds  to  the  cache  state  (i±, ...  ,iM)  being  loaded  from 
2This  is  discussed  in  some  details  in  Appendix  A. 
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left  to  right  with  documents  ordered  from  the  oldest  to  the  most  recent  (resp.  from  the 
most  recent  to  the  oldest). 

In  this  case,  the  Markov  chain  =  0,1,...}  is  ergodic  on  the  state  space 

A (M;  A/”);  its  stationary  distribution  exists  and  is  given  in  the  following  lemma. 

Lemma  7.1  Assume  the  input  to  be  modeled  according  to  the  IRM  with  popularity 
pmf  p.  For  any  RORA(r)  policy  in  Case  1  with  £r  empty,  the  cache  states  (fit,  t  = 
0, 1, . . .}  form  an  ergodic  Markov  chain  on  the  state  space  A ( M ;  Af)  with  stationary 
pmf  on  A(M;  Af)  given  by 

1  t 

pr(s-,p)  =  lim  -  ^2  1  [Qr  =  s]  a.s. 

T—  1 

=  C(p)~1p(i1)p(i2)  ■  ■  -p{iM)  (7.3) 

for  every  s  =  (i\, . . . ,  iM)  in  A (M;  Af)  with  normalizing  constant 

C(p):=  p(*i)p(*2)'"P(*m).  (7.4) 


Note  that  the  stationary  pmf  is  the  same  for  all  RORAs  in  Case  1. 

7.1.2  Case  2 

The  set  £r  is  not  empty,  and  some  documents,  once  put  in  cache,  will  never  be  replaced 
during  the  operation  of  the  cache,  i.e.,  if  f20  =  iM),  then  for  all  t  =  1,2,..., 

with  nt  =  (ji,  we  have 

je  =  U,  f  e  £r.  (7.5) 

Here  are  some  examples  of  RORA  policies  in  that  category:  For  a  permutation  a 
of{l,...,N},  the  policy  Aa  evicts  the  “smallest”  document  in  cache  with  documents 
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cr(l),  cr( 2), . . . ,  a(N)  “ordered”  in  decreasing  order.  The  documents  cr(l), . . . ,  a(M  — 
1),  once  loaded  in  the  cache,  will  remain  there,  and  in  the  steady  state,  the  cache  under 
the  policy  Aa  will  contain  the  documents  <r(l), . . . ,  a(M  —  1). 

This  behavior  can  be  recovered  through  the  RORA(r)  policy  with  matrix  r  of  the 
form  rkk  =  1  for  some  k  =  1, . . . ,  M,  in  which  case  £r  has  M  —  1  elements,  namely 
{1, . . . ,  k  —  1,  k  +  1, . . . ,  M}.  If  the  documents  <r(l), . . . ,  a(M  —  1)  are  initially  put 
in  cache  (i.e.,  preloaded)  at  the  other  positions  l  A  k  in  £r,  this  RORA(r)  policy  will 
behave  like  the  policy  Aa  in  its  steady  state  regime.  The  steady  state  behavior  of  the 
cache  under  the  policy  A0  is  that  of  the  RORA(r)  policy  above,  this  time,  the  preloaded 
documents  being  the  M  —  1  most  popular  documents. 

To  describe  the  long-run  behavior  of  the  cache  states  {Qt,  t  —  0, 1, . . .},  we  go  back 
to  (7.5).  First,  with  initial  cache  state  s0  =  (ii, . . .  Am)  in  A(M;  A/”),  we  denote  by 
£r(so)  the  set  of  initial  documents  with  positions  in  £r,  i.e., 

Sr(s0)  :=  {k  ■  £  e  £r}.  (7.6) 

Next,  we  introduce  the  component 

A (r,s0)  :=  { (ji ,  •  •  •  ,3m)  e  A(M;  A/”)  :  je  =  it,  £  €  Sr}.  (7.7) 

In  view  of  (7.5),  once  the  cache  state  is  in  A (r,  so),  it  remains  there  forever.  In  fact 
all  the  states  in  the  component  A (r,  s0)  communicate  with  each  other,  and  this  set  of 
states  is  closed  under  the  motion  of  the  Markov  chain  {Qt,  t  =  0, 1, . . .}.  Given  that 
|£r|  =  m,  there  are  (M  —  m)\  elements  in  A(r,  s0)  and  there  are  (^jrn\  distinct 

components  which  form  a  partition  of  A (M;  J\f). 

As  a  result,  when  restricted  to  A(r,  so),  this  Markov  chain  is  irreducible  and  aperi¬ 
odic,  and  its  ergodic  behavior  can  be  characterized  as  follows: 

Lemma  7.2  Assume  the  input  to  be  modeled  according  to  the  IRM  with  popularity  pmf 
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p.  For  any  RORA(r)  policy  in  Case  2  with  Er  —  m  and  initial  cache  state  so,  the  cache 
states  t  =  0.1,...}  form  an  ergodic  Markov  chain  on  the  component  A (r,  s o).  In 
particular  the  limit 

1  t 

fr,so(s-,p)  =  Hm  -  53  1  [fir  =  s]  a.s.  (7.8) 

always  exists  for  every  s  =  (i\, . . . ,  iM)  in  A (M;  A f)  and  is  given  by 

Cr(p,  s0)-1p(ii)p(i2)  •  •  -p(iM)  ,  seA(r,s0) 

Fr  ,s0(s-,p)  =  l  (7.9) 

[  o  ,  s  i  A(r,so) 

with  normalizing  constant 

Cr(p,  so)  :=  X!  p(ii)p(t2)---p(iAf)-  (7.10) 

(ti,...,!M)eA(r,s0) 

From  (7.7),  we  note  the  simplification 

hr,so(s]p)  =  CKp^o)-1  H  p(ie)  (7.11) 

H^r(so) 

for  each  s  —  (i±, . . . ,  im)  in  A (r,  s0)  with  normalizing  constant 

C'r{p,s o)  :=  H  II  P^e)-  (7.12) 

(!i,...,tM)6A(r,s0)  v^Er(s0) 

7.2  The  miss  rate  under  RORAs 
7.2.1  Case  1 

Fix  s  =  {*i, .  .-Am}  in  A  *(M]Af),  and  let  A(s\M;Af)  denote  the  subset  of  A(M;A/”) 
defined  by 

A(s\M-J\f)  :=  { (ji , . . .  ,jM)  e  A(M;Af)  :  {ji, . . .  ,Jm}  =  {*i,  •  •  -Cm}}  •  (7.13) 
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By  Lemma  7.1,  the  limit  (5.1)  exists  and  is  given  by 

1  1 

H*(s',p )  =  lim  -  Y  1  [ST  =  s]  a.s. 

t-°o  t  T=1 
Oi,-Jm)6A(S|M;AT) 

=  C{p)~lM\-p(il)p(i2)---p{iM)  (7.14) 

with  normalizing  constant  C(p)  given  by  (7.4).  The  last  equality  at  (7.14)  follows  from 
the  fact  that  |A(s|M ;  A/")  =  M\. 

Using  (7.14)  in  conjunction  with  Theorem  5.1,  we  readily  conclude  that  under  the 
RORA(r)  policy  of  Case  1  the  miss  rate  (4.8)  for  the  IRM  exists  as  a  constant  which 
is  independent  of  the  initial  cache  state  so-  To  acknowledge  this  fact,  we  simply  denote 
this  limiting  constant  by  Mr(p).  Specializing  (5.3)  leads  to 

Mr(p)  =  C(p)~1M\  ]T  p(h)-- -P^m)  Y,  P&) 

=  C(p)~1(M+  1)!  Y  p(h)  ■  ■  ■  p(tM+ 1) 

=  C(p)~1(M  + 1)1 -EM+hN(p )  (7.15) 

while  the  normalizing  constant  C(p )  given  by  (7.4)  can  be  simplified  as 

C(P)  =  Y  p(u)-"P(*m) 

(n,...,iM)eA(M;A0 

=  M\  Y  P(h)---p(iM) 

Ti,-,*m}6A*(M;A0 

=  M\  ■  EMyN(p).  (7.16) 

Combining  (7.15)  and  (7.16),  we  finally  get 

Mrip)  =  (M  +  1)  ■  E^+1’n^  =  (M  +  l)0>M+1N(p)  (1.11) 

Em,n(p) 

and  a  straightforward  application  of  Proposition  2.6  yields 
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Theorem  7.3  Under  any  RORA(r)  policy  in  Case  1,  for  admissible  pmfs  p  and  q  on 
J\f,  it  holds  that 

Mr(q)  <  Mr(p )  (7.18) 

whenever  p  -<  q. 

7.2.2  Case  2 

Consider  now  the  RORA(r)  policy  under  Case  2  when  the  set  £r  is  not  empty,  say  with 
|£r|  =  m  for  some  m  =  1, . . . ,  M  —  1,  and  let  the  cache  be  initially  in  state  so  in 
A (M;  A/”).  By  Lemma  7.2,  for  each  s  —  {k, . . . ,  iM}  in  A *(M ;  A/”)  the  limit  (5.1)  exists 
and  is  given  by 

1  t 

d*r,s0(s-,p)  =  lim  ~'52l[ST  =  s]  a.s. 

’  u  t  >oo  t  , 

7"=  1 

=  dr, s0  (s';p)  (7.19) 

s'=0'i>-dM)eA(s|r,so) 

where  A(s|r,  s0)  denotes  the  subset  of  A(r,  s0)  defined  by 

A(s|r,s0)  :=  { (ii ,  •  •  •  ,3m)  e  A(r,s0)  :  {ji,  ■  ■  ■  Jm}  =  {k,  ■  •  -  Am}}  •  (7.20) 

The  set  A(s|r,  s0)  is  non-empty  if  and  only  if 

Sr(s0)  C  {?!,...  ,iM}  (7.21) 

and  p,*SQ(s; p)  —  0  whenever  this  inclusion  (7.21)  does  not  hold.  With  this  in  mind,  we 
define 

A*(r,s0)  :=  (s  =  (ii, . . .  ,1m}  €  A* (M;  J\f)  :  Eqn.  (7.21)  holds  at  s}.  (7.22) 

Going  back  to  (7.11)  and  (7.12),  we  now  conclude  that  for  each  s  —  {k,  ■  ■  ■  Am}  in 
A *(r,  s0),  it  holds 

dr, so  i^p)  =  C'r(p,S0yl  n  p(jt) 

C7i>-dAf)eA(s|r,s0)  jtg  Sr(s0) 
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(7.23) 


=  C'r(p,s0)  1(M  —  m)\  ■  [|  p(ie) 

i^Sr(so) 

where  in  the  last  equality  we  combine  the  fact  {j i, . . . ,  jM}  =  {*i,  •  • . ,  im}  with  (7.21), 
and  then  made  use  of  the  identity  |A(s|r,  s0)|  =  (Af  —  m)\. 

Now,  using  (7.23)  in  conjunction  with  Theorem  5.1  we  see  that  under  the  RORA(r) 
policy  of  Case  2  the  miss  rate  (4.8)  for  the  IRM  exists  as  a  constant  which  depends  on 
the  initial  cache  state  s0.  We  record  this  fact  in  the  notation  by  denoting  this  limiting 
constant  by  Mr(p ;  s0).  As  in  Case  1,  specializing  (5.3)  leads  to 

Mr(p-,s0)  =  Cr{p,soyl(M  -  m)\  Y  II  Pfa)  12  p(*) 

{il,...,iM}sA*(r,so)  ,...,ijv/} 

=  C'r(p,  s0)_1(M  —  m  +  1)!  ■  EM_m+i)N(t  ■  p)  (7.24) 

where  the  element  t  in  1R+  is  specified  by  t,  =  0  for  i  being  a  document  in  Sr(so)  and 
ti  =  1  otherwise.  Moreover,  by  the  same  arguments  as  in  Case  1,  we  can  simplify  the 
normalizing  constant  Cr(p,  s0)  as 

Crip,  s0)  =  Y  II  P^e) 

(u,— >*M)6A(r,s0)  iegT,r(s0) 

=  (M  —  m)\  Y  n  p{it) 

(ii,...,iM}eA*(r,so)  i«^Sr(so) 

=  (M  -  m)\  ■  EM-m,N(t  ■  p)  (7.25) 


with  the  element  t  given  as  above.  It  then  follows  from  (7.24)  and  (7.25)  that 

,>  /  \  i  -i  \  EM-m+l,N{t  ■  p) 

Mr{p ;  s0)  =  (M-m  +  1)  •  — - - — - - 

& '  P) 

=  (M  -  m  +  l)$M-m+l,N{t  '  p). 


(7.26) 


Clearly,  the  documents  in  Sr(so)  do  not  contribute  to  the  miss  rate  since  they  never 
generate  a  miss  once  loaded  in  cache  -  This  is  regardless  of  the  order  in  which  they 
appear  in  the  cache  state  sQ.  This  intuitively  obvious  fact  is  in  agreement  with  the 
expression  (7.26)  from  which  we  see  that  for  any  two  initial  cache  states  so  and  s'0  in 
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A (M;J\f)  with  Sr(s0)  =  Er(so),  we  have  the  equality  Mr(p ;  s0)  =  Mr(p-,s'0).  As  a 
result,  we  shall  find  it  appropriate  to  denote  this  common  value  by  Mr^r(so)(p)- 

For  any  pmf  p  on  Af,  let  E*(p)  denote  the  set  of  the  m  most  popular  documents 
according  to  the  pmf  p.  Equipped  with  the  expression  (7.26),  we  are  now  ready  to 
establish  the  key  result  for  RORA  policies  in  Case  2. 

Theorem  7.4  Under  any  RORA(r)  policy  in  Case  2  with  |£r|  =  m  for  some  m  = 
1, . . . ,  M  —  1,  for  admissible  pmfs  p  and  q  on  Af,  it  holds  that 

Mr,z *(<?)(<?)  <  Mr,T,*(p)(p )  (7.27) 

whenever  p  -<  q. 


Proof.  The  desired  result  will  be  established  if  we  can  show  that  the  miss  rate  function 
p  — >  Mr£,T(S0){p)  as  given  in  (7.26)  is  Schur-concave  whenever  s0  is  selected  so  that 

sr(So)  =  £*G p). 

As  we  can  always  relabel  the  documents,  there  is  no  loss  of  generality  in  assuming 
p(l)  >  p{ 2)  >  . . .  >  p(N),  whence  E*(p)  =  {1, . . . ,  m}  and  the  element  t  in  (7.26) 
can  be  specified  as  U  —  . . .  —  tm  —  0  and  tm+\  —  ...  —  tN  —  1.  By  Proposition  2.6, 
the  mapping  $M-m+i,N  is  increasing  and  Schur-concave  on  IR  ) ,  and  by  virtue  of  the 
defining  property  of  E *(p),  we  have 

Mr,z *(p)(p)  =  min  (M  —  m  +  l)<hM-m+i,N(t  ■  <?i(p))  (7.28) 

where  {ot,  i  —  1, . . . ,  N\}  is  a  collection  of  all  permutations  of  (1, . . . ,  N}. 

The  mapping  h  :  1RA  !  — >  IR  :  y  — >  111111(2/!,  •  •  • ,  Vn\)  is  clearly  increasing,  sym¬ 
metric  and  concave,  while  the  mapping  $M-m+i,N  is  concave  on  D  < x  by  Proposition 
2.6.  Combining  these  facts  with  the  expression  (7.28)  for  Mr  S*(p)(p),  we  conclude  by 
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Proposition  2.8  to  the  Schur-concavity  (in  the  pmf  vector)  of  the  miss  rate  functional 
(7.26)  under  the  RORA  policy  when  £r(so)  =  £*(p).  ■ 


7.3  The  output  under  RORAs 

We  now  discuss  the  popularity  pmf  of  the  output  generated  under  the  RORA  policies 
still  under  the  assumed  IRM  input  stream. 

7.3.1  Case  1 

As  we  invoke  Theorem  5.2,  we  can  make  use  of  the  expressions  (7.14)  into  the  relation 
(5.5).  For  each  i  —  1, . . . ,  N,  this  yields 

mr(i; p)  =  C(p)_1M!  -p(ii)p(i2)  ■  ■  -p(iM) 

seA*(M;A0 

=  (7.29) 

Em,n{P ) 

where  the  last  equality  follows  from  (7.16)  and  by  recalling  the  definition  of  p1'-1  given 
at  (6.7).  Reporting  (7.29)  back  into  (5.4),  we  conclude  that  the  popularity  pmf  p*  of 
the  output  produced  by  the  RORA(r)  policy  in  Case  1  is  indeed  of  the  form  (6.8),  and 
Theorem  6.4  gives  us 

Theorem  7.5  Under  any  RORA(r)  policy  in  Case  1,  it  holds  that  p*  A  p. 

By  going  back  to  the  proof  of  Theorem  6.4,  the  reader  will  readily  check  from  (7.29) 
that  the  RORA(r)  policy  in  Case  1  is  indeed  a  good  policy. 
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7.3.2  Case  2 


Assume  Er  =  m  for  some  m  —  1, . . , ,  M  —  1,  and  let  the  cache  be  initially  in  state 
so  in  A (M;  J\f).  We  define  the  pmf  n  on  Er(so)c  to  be  the  conditional  pmf  induced  on 
Er(s0)c  by  p\  it  is  defined  as 


vr(i) 


P(i) 

Zqe£r(s0)c  PU)  ’ 


i  G  Er(so)c. 


(7.30) 


For  all  i  in  Er(so),  it  is  clear  that  /nrjSo(i;  p)  —  0  while  for  document  i  not  in  Er(so)c, 
with  the  expression  for  p*)So(s;  p)  given  in  (7.23),  we  find 


mr,S0{i;p )  =  C'r(p,s0)  1(M  —  m)\  ■  p(ie) 

sGA*(r’,so):  i£s  i^Tir(so) 

■  p) 

EM-m,N(t(2)  -P) 

_  Em— m,N— m— 1(’^-  )  (7  31) 

Em  — m,N—mid 0 

where  the  element  t'l)  and  t{2]  of  IR  v  are  specified  by  =  tt2}  =  0  for  j  being  a 
document  in  Er(s0),  =  0,  =  1  and  t'p  =  ti2)  =  1  for  all  j  ^  i  being  a  document 

in  Er(so)c.  In  the  second  equality  we  made  use  of  the  expression  (7.25). 

On  revisiting  the  proof  of  Theorem  6.4,  we  note  that  for  distinct  i,  j  in  Er(s0)c,  we 
havemriSo(i;p)  <  mr,So (j ; p)  whenever p(j)  <  p(i).  Consequently,  since mr!So(i;p)  = 
0  for  all  i  in  Er(s0),  we  conclude  that  the  RORA  policy  in  Case  2  is  a  good  policy  if  the 
documents  in  Er(s0)  are  the  m  most  popular  documents,  i.e.,  Er(s0)  =  £*(p). 
Combining  (7.31)  with  (5.4),  we  immediately  get 


Pv. 


so 


T  = 


0  if  i  G  E(s0) 

ir(i)EM  N_m_ ifi^S(so). 

2^jeS(so)c'>TU)EM-m,N-rrl-l(TrU))  ^  V  U' 


(7.32) 


Since  p *  SQ (i)  =  0  whenever  i  belongs  to  Er(s0),  it  is  more  natural  to  seek  a  comparison 
between  p*  and  the  conditional  pmf  7 r. 
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Theorem  7.6  Under  any  RORA(r)  policy  in  Case  2,  it  holds  thatp*SQ  -<  7r. 


Proof.  The  arguments  are  essentially  those  given  in  the  proof  of  Theorem  6.4.  We 
immediately  obtain  the  desired  result  upon  identifying  n  and  £r(so)c  with  p  and  J\f  in 
Theorem  6.4,  respectively.  ■ 
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Chapter  8 


Self-organizing  Policies 


In  this  chapter,  we  investigate  the  folk  theorems  under  the  IRM  for  the  miss  rate  and 
the  output  of  a  cache  operated  by  well-known  self-organizing  policies,  namely,  the  LRU 
and  CLIMB  policies.  The  LRU  and  CLIMB  policies  are  described  in  Section  4.3.  From 
the  positive  results  achieved  under  the  RORA  policies,  one  might  expect  that  the  folk 
theorems  would  hold  under  these  two  self-organizing  policies.  However,  both  folk  the¬ 
orems  for  the  miss  rate  and  the  output  under  the  LRU  and  CLIMB  policies  fail  to  hold 
in  general.  Nonetheless,  as  we  restrict  ourself  to  the  class  of  IRM  inputs  with  Zipf- 
like  popularity  pmf  (6.4)-(6.5),  simulation  results  and  asymptotics  suggest  that  the  folk 
theorems  might  hold  under  the  IRM  with  this  class  of  popularity  pmfs. 

We  now  discuss  the  results  for  the  LRU  and  CLIMB  policies,  respectively. 

8.1  The  miss  rate  under  the  LRU  policy 

Under  the  IRM  with  admissible  popularity  pmf  p,  it  is  known  [2,  Thm.  9,  p.  130]  [24, 
Thm.  6.5,  p.  272]  that  the  LRU  cache  states  {Clt,  t  —  0, 1, . . .}  form  a  stationary  ergodic 
Markov  chain  over  the  finite  state  space  A (M;Af)  with  stationary  distribution  given  by 

1  t 

Alru(-s;p)  =  Jim  Tl[flr  =  s]  a.s. 

t — XX)  t  — , 
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(8.1) 


p(»i)-"P(»m) 

P&)) 

for  every  s  =  (ii, . . .  ,?m)  in  A(M;  A/”).  Consequently,  the  limit  (5.1)  exists  for  each 
s  =  {ii, . . . ,  iM}  in  A *(M;  A/”)  as 


1 

Alru(s;  P)  =  ^  E  1  =  s]  a-s. 


E 


P(ji)  ■  ■ -p(Jm) 

iM-1, 


(8.2) 


0l>... Jm)GA(s|M;A/’)  nfc=l  (1  _  E^=iP(a)) 
where  A(s|M;  A/")  is  defined  at  (7.13). 

The  miss  rate  of  the  LRU  policy  under  IRM  can  then  be  evaluated  from  (5.3)  (see 
also  [2,  Chap.  4])  as 

p(ii)  ■  ■  -P^m)  (i  -  :  /'(/',]) 


Mlru(p)  =  E 
If  instead  we  use  (5.2),  as  we  note  that 


nEEi  -ELpfe)) 


(8.3) 


E  E  -  =  E 

seAJ(M;A0  V(jir.. Jm)gA(s|M;A/")  /  seA;(M;A0 


it  is  now  plain  that 


N 


Mlrv(p)  =  Y,p(i)  E 


p(h)  ■  ■  ■  p(iM) 

tM—1 


i=  1 


eA^;Ao  nfE(i-EEp(^)) 


(8.4) 


8.1.1  A  counterexample 


Contrary  to  what  transpired  with  RORA  policies,  the  miss  rate  under  the  LRU  policy  is 
not  Schur-concave  in  general,  and  consequently  the  folk  theorem  (6.1)  does  not  hold. 
This  is  demonstrated  through  the  following  example  developed  for  M  —  3  and  Ar  =  4: 

In  this  case,  simple  algebraic  manipulations  transform  (8.3)  into  the  simpler  expres¬ 
sion 


MlruGp) 


E  ^ 

(^1)^2)£A(2;A0 


2p(l)p(2)p(3)p(4) 

nLi(i-£?=iP(u))' 


(8.5) 
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Figure  8.1:  LRU  miss  rate  when  M  =  3,  N  =  4,  y  =  p( 3)  =  p( 4)  =  0.05,  p(  1)  =  x 
andp(2)  =  0.9  —  p(  1) 

We  evaluated  the  expressions  (8.5)  for  the  family  of  pmfs 

P{x,y)  =  (x,  l-2y-x,y,y),  0<y<^  (8.6) 

with  x  in  the  interval  [|  —  y,  1  —  3y] .  Under  these  constraints,  the  components  of  the  pmf 
p(x,  y)  are  listed  in  decreasing  order  and  for  any  given  y,  it  holds  that  p(x,  y)  -<  p(x',  y) 
whenever  x  <  x'  in  the  interval  [ -  y,  1  —  3y].  Therefore,  if  the  miss  rate  under  the 
LRU  policy  were  indeed  a  Schur-concave  function  in  the  popularity  pmf,  the  functions 
x  — >  MLRU(p(a;,  y))  should  be  monotone  decreasing  in  x  on  the  interval  [|  —  y,  1  —■  3y\. 

Figures  8.1  and  8.2  display  the  numerical  values  of  MLRU(p(a;,  y))  as  a  function  of  x 
with  y  =  0.05  and  y  =  0.01,  respectively.  In  both  cases,  the  miss  rate  of  the  LRU  policy 
is  not  monotone  decreasing  in  x  on  the  range  [|  —  y,  1  —  3y],  with  the  trend  becoming 
more  pronounced  with  decreasing  y.  In  short,  the  miss  rate  is  not  Schur-concave  under 
the  LRU  policy. 
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Figure  8.2:  LRU  miss  rate  when  M  —  3,  N  —  4,  y  —  p( 3)  =  p( 4)  =  0.01,  p(l)  =  x 
and  p(  2)  =  0.98  —  p(l) 

8.1.2  LRU  miss  rate  and  IRM  with  Zipf-like  popularity  pmfs 

While  the  miss  rate  is  not  Schur-concave  under  the  LRU  policy,  the  desired  monotonicity 
(6.1)  is  nevertheless  true  in  an  asymptotic  sense  when  the  popularity  pmf  is  restricted  to 
the  class  of  Zipf-like  pmfs. 

Theorem  8.1  Assume  the  IRM  input  to  have  a  Zipf-like  popularity  pmf  pct  for  some 
a  >  0.  Then,  there  exists  a*  =  a*(M,N )  >  0  and  A  >  0  such  that  M\auj(.Pq)  < 
Mlru (Po)  whenever  a*  <  a  and  a  +  A  <  (3. 

This  result  is  a  byproduct  of  the  asymptotic  equivalence 


C.  _  _  ± 


(8.7) 
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established  in  Appendix  B.l.  Indeed,  for  every  £  in  the  interval  (0,1),  there  exists 

a*(M,  N)  >  0  such  that  for  a  >  a*, 


l-e< 


j^LRU (Pa 

2(M  +  1)' 


<l+£. 


(8.8) 


Thus,  for  a*  <  a  <  /3,  we  conclude  that 


1 — -  •  (M  +  lf~a  <  ^LRU^  <  1  +  £  .  (M  +  l)/3-"  (8.9) 

1  +  e  "  Mlru^)  “  1  -  e  1  ' 


and  the  desired  result  follows  whenever  /3  —  a  >  A  with  A  >  0  selected  such  that 


———  —  (M  +  1)A. 

1  —  £ 

Of  course  such  a  selection  is  always  possible. 

We  have  also  carried  out  simulations  of  a  cache  operating  under  the  LRU  policy 
when  the  IRM  input  has  a  Zipf-like  popularity  pmf  pa.1  The  number  of  documents 
is  set  at  N  =  1,  000  while  the  cache  size  is  M  —  100.  The  miss  rate  of  the  LRU 
policy  is  displayed  in  Figure  8.3  and  8.4  for  small  a  (0  <  a  <  1)  and  large  a  (a  >  1), 
respectively.  It  appears  that  the  miss  rate  is  indeed  decreasing  as  the  skewness  parameter 
a  increases  across  the  entire  range  of  a.  This  suggests  that  the  folk  theorem  for  miss 
rates  probably  holds  under  the  LRU  policy  when  the  comparison  is  made  within  the 
class  of  Zipf-like  popularity  pmfs,  hence  the  following 


Conjecture  8.2  For  arbitrary  cache  size  M  and  number  of  documents  N,  the  function 
a  —>  MLRu(pa )  ,s  strictly  decreasing  on  [0,  oo). 

1  We  choose  simulations  over  numerical  evaluation  of  (8.3)  because  this  expression  is  not  suitable  for 
numerical  evaluation  due  to  a  combinatorial  explosion,  as  pointed  out  in  [33]. 
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Figure  8.3:  LRU  miss  rate  when  the  IRM  input  has  a  Zipf-like  popularity  pmf  pa  for  a 
small  (0  <  a  <  1) 


Figure  8.4:  LRU  miss  rate  when  the  IRM  input  has  a  Zipf-like  popularity  pmf  pa  for  a 
large  (a  >  1) 
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8.2  The  output  under  the  LRU  policy 


With  the  expressions  (8.1)  for  the  LRU  cache  stationary  distribution  under  the  IRM,  it 
is  a  simple  matter  to  check  for  each  i  =  1, . . . ,  N,  that 


mLRV(i]p)  =  E  Alru(s;p) 
seAi(M;A0 

p(h) 


E 

seAj(M;A0 


nEi  '(i-eLp^)' 


Theorem  5.2  then  gives  the  output  popularity  pmf  in  the  form 

P(i)  ^  p{i\)  ■  ■  ■  p{im) 


PlruW  — 


E 


Muw(p) 

for  each  i  —  1, . . . ,  N,  as  we  make  use  of  (5.8). 


s6Ai(M;A0  nfE(l  ~EJ=1P&)) 


(8.10) 


(8.11) 


8.2.1  LRU  is  a  good  policy 

We  begin  with  a  positive  result. 

Lemma  8.3  The  LRU  policy  is  a  good  policy. 


Proof.  Pick  distinct  i,  j  =  1, . . . ,  N  with  p(j)  <  p{i).  We  need  to  show  that 

mL RU  (i ;  p)  <  w-lru  ( j;p ) .  (8.1 2) 

We  begin  by  writing  mLRu(*;  P)  as 

^LRu(*;P)  =  E  /^LRu(s;P)  +  E  ALRu(UP)  (8-13) 

sCKi(M-pf):  j£s  seAi(M;Af):  jgs 

with  a  similar  expression  for  /hlru (j  ;  p) .  The  fact  that  the  sets  (s  G  A *(M;  A/”)  :  j  ^  s} 
and  (s  G  A :  i  s}  coincide  leads  to 

mLKij(i;p)  -  mLRV(j;p)  =  J2  Alru (s;p) 

seAi(M;A/‘):  jSs 

-  E  Alru(-s;p).  (8.14) 

sCAj(M-jC):  ids 
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The  sets  {s  G  A i(M;A/’)  :  j  G  s}  and  (s  G  A :  i  G  s}  can  be  put  into 
one-to-one  correspondence  with  each  other  as  follows:  Each  element  s  in  the  former 
set  does  not  contain  i  but  contains  j  in  exactly  one  position,  say  position  k  for  some 
k  =  1, . . . ,  M,  with  all  other  positions  occupied  by  neither  i  nor  j.  Thus,  with  such 
an  element  s  we  can  associate  an  element  T(s)  in  A j(M ;  J\f )  by  substituting  i  for  j  at 
position  k  and  letting  all  other  positions  unchanged.  This  element  T(s)  now  contains  i 
but  not  j  anymore,  and  is  therefore  an  element  of  the  latter  set.  Moreover,  for  such  an 
element  T(s)  it  holds  that 

Alru(a;p)  <  ^lkv  (T(s);p)  (8.15) 

as  a  consequence  of  the  assumption  p(j)  <  p{i)  and  of  the  expression  (8.1).  With  these 
observations  in  mind,  we  find  that 

5Z  Alru  (s;p)  =  Alru  (T(s);p) 

seAj(M-AT):  i£s  j£s 

>  Alru  (s-p) 

seAi(M;Af):  jGs 

and  the  conclusion  (8.12)  is  now  immediate  via  (8.14).  ■ 


8.2.2  Counterexamples 

In  view  of  Lemma  8.3,  it  is  tempting  to  expect  that  the  majorization  comparison  pfHU  -< 
p  also  holds  under  the  LRU  policy.  This  is  not  true  in  general  as  the  following  coun¬ 
terexamples  show:  Lix  N  =  2,  3, . . ..  Assume  that  the  input  to  the  cache  is  the  IRM 
with  popularity  pmf  p£  where  we  set 

Pe  =  (!-  (N~  l)e,  (8.16) 


74 


for  some  0  <  e  <  ^.  Note  that  pe(l)  >  p£( 2)  =  •  •  •  =  pe(N),  and  as  £  — ■>  -b,  the  pmf 
p£  approaches  the  uniform  distribution  u  while  as  e  — >  0,  it  degenerates  to  (1,  0, . . . ,  0). 
Indeed,  from  Lemma  2.5,  we  find  that  p  -<  p£2  whenever  e2  <  £\. 

Under  the  LRU  policy,  it  is  plain  from  (8. 10)-(8. 11)  that  the  output  popularity  pmf 
Plru,£ ’s  of  the  form 

p*  =  (l-(N~l)5(e),S(e)r...,S(e)).  (8.17) 

for  some  mapping  5  :  (0,  -b]  — >  (0,  jyzi).  Because  of  their  special  structures,  (8.16)  and 
(8.17),  the  comparison  between  p£  and  p(RU  e  depends  only  on  the  value  of  5(e);  this 
fact  is  stated  in 

Proposition  8.4  For  each  0  <£<Jr  let  p£  and  p*  be  the  pmfs  of  the  form  (8. 1 6)  and 
(8. 1 7),  respectively. 

(i)  If  0  <  5(e)  <  e,  then  the  comparison  p£  -<  pi  holds; 

(11)  If  e  <  5(e)  <  ,  then  the  comparison  p*  -<  p£  holds; 

(Hi)  If  Yey  <  5(e)  <  min(l  —  (N  —  l)e,  then  neither  the  comparison  pi  -<  p£ 
nor  the  comparison  p  -<  pi  holds;  and 

(iv)  If  min(l  —  (N  —  l)e,  <  5(e)  <  d7cn  comparison  p.  -<  pi  holds. 


Proof.  Fix  0  <  e  <  j..  The  discussion  is  separated  into  2  cases,  namely  (a)  0  <  5(e)  < 

jf  and  (b)  jf  <  5(s)  <  Pl¬ 
ease  (a)  -  With  0  <  5(e)  <  -b,  we  note  that  p*(  1)  >  p*( 2)  =  •  •  •  =  p*(N).  By 
Lemma  2.5,  the  comparison  p *  -<  p£  (resp.  p£  -<  pi)  holds  whenever 

5(e)  >  (<)  e,  (8.18) 
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and  Claim  (i)  is  obtained. 


Case  (b)  -  When  <  5(e)  <  we  have  p*(l)  <  p*(2)  —  ■  ■  ■  —  p*( N).  In  this 
case,  the  conditions  (2.1)  for  the  majorization  comparison  p*  -<  p£  (resp.  p  -<  p*)  are 
simply 

k6(e)  +  (N  —  k)s  <  (>)  1,  k  —  1, . . .  ,N  —  1.  (8.19) 

Because  5(e)  >  £  in  this  case,  the  left-hand  side  of  (8.19)  is  monotone  increasing  in  k. 
From  this  observation  and  (8.19),  the  comparison  p*  -<  p£  will  hold  if 

m  <  2^-,  (8.20) 

while  the  comparison  p£  -<  p*  will  hold  if 

6(e)  >  1  -  (N  -  l)e.  (8.21) 

However,  neither  the  comparison  p£  -<  p*  nor  the  comparison  p*  -<  p£  holds  if 

<  i(£)  <  1  -  (N  -  1)£.  (8.22) 

Combining  (8.18)  and  (8.20)  yields  Claim  (ii).  Upon  recalling  that  6(e)  <  we 
obtain  Claim  (iii)  and  (iv)  from  (8.22)  and  (8.21),  respectively.  ■ 


Using  Proposition  8.4,  we  show  under  the  LRU  policy  that  it  is  possible  to  find  some 
0  <  £  <  N  such  that  <5(e)  >  M  ,  and  thus  the  desired  comparison  jj(ru  £  pF  does 
not  hold.  This  result  is  given  in  the  following  theorem:  its  proof  is  available  in  Appendix 
C.l. 


Theorem  8.5  Assume  the  IRM  input  to  have  the  popularity  pmf  p£  for  some  0  <  e  < 
jf.  Under  the  LRU  policy,  whenever 


0  <  e  < 


\^i= i  N-e)  1 

TA  ’ 

N_£j 


(8.23) 
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the  comparison  Pj  juj  £  P  does  not  hold  provided  that  the  number  of  documents  N 
and  the  cache  size  M  satisfy  the  condition  J^eLi1  jyzi  >  1- 

For  example,  if  we  take  p£  with  parameters  N  —  10  and  £  =  0.05  and  set  the  cache 
size  M  —  8,  a  simple  calculation  yields  5(e)  =  0.1111  and  the  assumptions  of  Theorem 
8.5  are  satisfied.  Thus,  the  comparison  Plru,?  "5  Pe  does  not  hold.  However,  the  entropy 
of  p£  is  smaller  than  the  entropy  of  Plru  i-e-> 

0.7283  =  H(pe)  <  H(plw Ui6)  =  0.9554. 

This  suggests  that  p£RU  e 's  more  balanced  than  p£  in  the  sense  of  entropy  comparison. 
Hence,  even  though  the  comparison  in  the  majorization  ordering  does  not  hold,  the 
entropy  comparison  might  still  be  valid.  This  should  not  come  as  a  surprise  since  the 
majorization  comparison  is  a  stronger  notion  than  the  entropy  comparison. 

As  for  the  case  of  the  LRU  miss  rate,  we  would  expect  that  the  comparison  Plru  P 
under  the  LRU  policy  would  hold  within  the  class  of  IRM  inputs  with  Zipf-like  popular¬ 
ity  pmf  pa.  However,  this  is  not  the  case  as  the  following  example  demonstrates:  With 
M  —  3  and  iV  =  4  under  the  Zipf-like  popularity  pmf  (6.4)-(6.5)  with  a  =  3,  we  have 
computed  the  output  popularity  pmf  under  the  LRU  policy  using  (8.11).  The  numerical 
values  of  both  input  and  output  popularity  pmfs  are  given  in  Table  8.1. 

Table  8.1:  pa  and  pfJ{U  a  under  the  LRU  policy  when  the  IRM  input  has  a  Zipf-like 
popularity  pmf  pa  with  parameter  a  =  3 


i 

1 

2 

3 

4 

Pa 

0.8491 

0.1061 

0.0314 

0.0133 

PLRU,a 

0.0118 

0.2031 

0.3853 

0.3998 

By  the  definition  of  majorization  (2.1)-(2.2),  the  comparison  Plru  a  Pa  requires 


.  mm  pa (i)  <  .  min  p*RU  (i),  (8.24) 

1=1  i=l  ’ 
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in  clear  contradiction  with  Table  8.1,  and  therefore  does  not  hold.  On  the  other  hand, 
the  comparison  pa  -<  pfRU  a  is  not  valid  either  since  it  calls  for  the  unmet  requirement 

.  max  pa(i)  <  max  Plru,q(*)-  (8-25) 

i=l,...,N  i=l,. ..,N 

In  short,  pa  and  pRru  a  are  not  comparable  in  the  majorization  ordering.  This  situation 
does  not  represent  an  isolated  incident  as  the  next  theorem  shows;  its  proof  is  available 
in  Appendix  B.2. 

Theorem  8.6  Assume  the  IRM  input  to  have  a  Zipf-like  popularity  pmf  pQ  for  some 
a  >  0.  If  the  number  of  documents  N  and  the  cache  size  M  satisfy  the  condition 

N  <  Ml,  (8.26) 

then  under  the  LRU  policy,  there  exists  a*  =  o*(M.  N )  such  that  pf,RUa  -<  pa  does  not 
hold  whenever  a  >  a*. 

8.2.3  A  conjecture 

Theorems  7.5  and  7.6  were  valid  for  all  values  of  M  and  N,  and  for  arbitrary  admissible 
pmfs.  While  the  counterexamples  discussed  earlier  dash  our  hope  to  get  an  analogous 
result  for  the  LRU  policy,  the  possibility  remains,  fueled  by  Corollary  6.8,  that  the  pos¬ 
itive  result  is  nevertheless  valid  in  some  appropriate  range  of  the  parameters  M  and  N. 
We  now  explore  this  issue  still  with  Zipf-like  popularity  pmfs  (6.4)-(6.5). 

Conjecture  8.7  Assume  the  IRM  input  to  have  a  Zipf-like  popularity  pmfpa  for  some 
o  >  0.  For  each  N  =  1,  2, . . .,  under  the  LRU  policy,  there  exists  an  integer  M *  = 

M*(a;  N )  with  1  <  M*  <  N  such  that  plRU  a  -<  pa  whenever  M  =  1, . . . ,  M* . 

In  support  of  this  conjecture,  we  have  carried  out  simulations  of  the  cache  operating 
under  the  LRU  policy  when  the  IRM  input  has  Zipf-like  popularity  pmf  with  parameter 
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a  =  0.8, 1  and  2  and  N  —  1,  000.  We  find  the  output  popularity  pmfs  for  different 
values  of  cache  size,  namely  M  =  10, 50, 100  and  500.  The  resulting  output  popularity 
pmfs  in  the  original  order  of  documents  are  shown  in  Figure  8.5,  while  the  results  after 
rearranging  documents  in  the  decreasing  order  of  their  output  probabilities  are  displayed 
in  Figure  8.6. 

From  Figure  8.6  (a),  when  a  =  0.8,  the  comparison  Plru  a  ^  Pa  holds  for  M  = 
10,  50.  This  follows  from  the  sufficient  condition  for  majorization  comparison  provided 
in  Proposition  2.1.  Indeed,  from  their  respective  plots,  we  observe  that  the  pmfs  pa  and 
pfRUo  when  arranged  in  decreasing  order  intersect  only  once,  namely  Plru  «([*])  < 
Pa(i),i  =  !,■■•,&,  andpLRu,a([*D  >Pa{i),i  =  k  + 1, . . . ,  N,  for  some  k  =  1 
1,  where  KRU,Q([1])  >  Plru.oQ2])  >  ■  ■  ■  >  are  the  components  of  pT*RU,a 

arranged  in  decreasing  order. 

However,  for  a  =  0.8  and  M  =  100,  500,  despite  the  fact  that  in  Figure  8.6  (a),  p* 
of  both  cases  look  uniform  in  the  range  where  document  rank  is  smaller  than  M,  the 
comparison  Plru  q  -<  Pa  is  invalid  since  the  necessary  condition  (8.24)  does  not  hold. 
This  violation,  min^.^jvPLRU  <*(*)  <  Pa(N),  can  be  easily  seen  from  Figure  8.5  (a)  or 
from  the  subfigure  inside  Figure  8.6  (a). 

For  a  =  1  and  a  =  2,  by  the  same  arguments,  we  conclude  from  Figures  8.5  (b)-(c) 
and  8.6  (b)-(c)  that  the  comparison  Plru,«  "5  Pa  holds  for  M  =  10  but  does  not  hold  for 
other  cache  sizes  M  =  50, 100,  500.  Therefore,  these  experimental  findings  agree  with 
Conjecture  8.7  and  suggest  that  the  value  of  M*(a;  N)  in  Conjecture  8.7  decreases  as  a 
increases.  This  last  observation  is  supported  by  the  observation  that  for  a  =  0,  both  p0 
and  Plru  o  arc  the  uniform  pmf  u  on  A/”,  thus  the  comparison  pf  RU  ,o  p0  holds  for  all 
M  —  1, . . . ,  N  —  1,  whence  M*( 0;  N)  —  N  —  1. 
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-  Zipf-like  a=0.8 

---  M=  10 


document 


Figure  8.5:  LRU  output  popularity  pmf  with  different  cache  sizes  M  when  the  IRM 
input  has  a  Zipf-like  popularity  pmf  pa  with  (a)  a  =  0.8,  (b)  a  =  1  and  (c)  a  =  2. 
Documents  are  arranged  in  the  original  order  of  the  input  pmf  pQ. 
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(a)  a  =  0.8 


10 


-  Zipf 

-like  a=0.8 

- M  = 

10 

M  = 
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—  M  = 

100 

-  -  M  = 
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document  rank 


document  rank 


document  rank 


Figure  8.6:  LRU  output  popularity  pmf  with  different  cache  sizes  M  when  the  IRM 
input  has  a  Zipf-like  popularity  pmf  pa  with  (a)  a  =  0.8,  (b)  a  =  1  and  (c)  a  =  2. 
Documents  are  ranked  according  to  their  probabilities. 
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8.3  The  miss  rate  under  the  CLIMB  policy 


Under  the  IRM  assumption  on  the  input,  the  CLIMB  cache  states  =  0, 1, . . .} 

form  a  stationary  ergodic  Markov  chain  on  the  finite  state  space  A (M;  J\f)  with  station¬ 
ary  distribution  [2,  p.  133]  given  by 


/iCL (s;p)  =  lim-Vl  fiT  =  s  a.s. 

t-°o  t  T=1 

1  M 

=  T^n  P(h)M~t+1  (8-27) 

ACl  £=1 

for  each  s  —  (*i, . .  - ,  *m)  in  A (M;  J\f),  where  the  normalizing  constant  is  simply 

M 

*cl:=  E  II 

(u,-,iM)eA(M;AT)  i=  1 

The  limit  (5.1)  then  exists  for  each  s  =  (ii, ...  ,iM}  in  A *(M ;  A/”)  as 


Acl(U£>)  =  lim  Ti  [Ar  =  s]  a.s. 

°°  1  r=l 

i  M 

=  7—  E  n^(A)M-m-  (8-28) 

CL  (fir- Jm)gA(s|M;A/")  ^=1 

The  miss  rate  of  the  CLIMB  policy  under  IRM  can  now  be  obtained  [2,  Chap.  4] 
from  (5.3)  as 

i  M  (  M  \ 

MCl{p)  =  -j—  E  II  P{U)M~t+1  1  -  Ep(u)  (8-29) 

CL  (ii,-,iM)eA(M;Ar)£=l  V  j=l  / 

or  from  (5.2)  as 

N  i  M 

MCl(p)=Ep(0  E  T^rip(A)M'm  (8.30) 

i=l  s£Ai(M;Af)  CL  t=  1 


8.3.1  A  counterexample 

As  in  the  case  of  the  LRU  miss  rate,  the  miss  rate  for  the  CLIMB  policy  is  in  gen¬ 
eral  not  a  Schur-concave  function,  and  thus  the  folk  theorem  (6.1)  does  not  hold.  We 
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Figure  8.7:  CLIMB  miss  rate  when  M  =  3,  N  =  4,  y  =  p( 3)  =  p(4)  =  0.05,  p(l)  =  x 
andp(2)  =  0.9  —  p(l) 

demonstrate  this  fact  through  the  same  counterexample  developed  for  the  LRU  policy 
in  Section  8.1.1. 

In  that  case,  we  set  M  =  3  and  N  =  A  and  the  expression  (8.29)  can  be  simplified 
as 


(8.31) 


S(ii,i2,*3)eA(3;AT)P(u)3P(*2)2p(u) 


The  numerical  values  of  the  expression  (8.31)  are  evaluated  for  the  family  of  pmfs  (8.6) 
with  x  in  the  interval  [|  —  y,  1  —  3y] .  Under  these  constraints,  it  holds  that  p(x,  y )  -< 
p(x',  y )  whenever  x  <  x'  in  the  interval  [|  —  y,  1  —  3y]  and  for  the  CLIMB  miss  rate 
to  be  Schur-concave,  the  function  x  MCl{p(x,  y))  must  be  monotone  decreasing  on 
the  interval  [|  —  y,  1  —  3y]. 

Figures  8.7  and  8.8  display  the  numerical  values  of  MCl(p(x,  y))  as  a  function  of  x 
with  y  =  0.05  and  y  =  0.01,  respectively.  In  both  cases,  the  miss  rate  of  the  CLIMB 
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Figure  8.8:  CLIMB  miss  rate  when  M  =  3,  N  —  4,  y  —  p{ 3)  =  p{ 4)  =  0.01,  p(l)  =  x 
and  p(  2)  =  0.98  —  p(l) 

policy  is  not  monotone  decreasing  in  x  on  the  entire  range  and  thus  the  miss  rate  is  not 
always  Schur-concave  under  the  CLIMB  policy. 

8.3.2  CLIMB  miss  rate  and  IRM  with  Zipf-like  popularity  pmfs 

Although  the  CLIMB  miss  rate  is  not  Schur-concave  in  general,  the  desired  monotonic¬ 
ity  (6.1)  holds  asymptotically  when  the  popularity  pmf  of  the  IRM  input  lies  in  the  class 
of  Zipf-like  pmfs. 

Theorem  8.8  Assume  the  IRM  input  to  have  a  Zipf-like  popularity  pmf  pa  for  some 
a  >  0.  Then,  there  exists  a*  =  a*(M,N)  >  0  and  A  >  0  such  that  McrXPe)  < 
MCl (Po)  whenever  a*  <  a  and  a  +  A  <  /3. 
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Similarly  to  Theorem  8.1,  this  theorem  is  a  by-product  of  the  asymptotics 


lim 

a— >oo 


(M  +  1)-Q 


2 


(8.32) 


obtained  in  the  Appendix  B.3. 

In  addition,  we  carry  out  simulations  of  a  cache  operating  under  the  CLIMB  policy 
when  the  IRM  input  has  a  Zipf-like  popularity  pmf  pa.  We  set  the  number  of  documents 
N  —  1,  000  and  cache  size  M  =  100.  Figure  8.9  and  8.10  show  the  miss  rate  of  the 
CLIMB  policy  when  a  is  small  (0  <  a  <  1)  and  large  (a  >  1),  respectively.  As  for  the 
LRU  miss  rate,  the  CLIMB  miss  rate  appears  to  be  decreasing  as  the  skewness  parameter 
a  increases  across  the  entire  range  of  a,  thereby  suggesting  the  following 


Conjecture  8.9  For  arbitrary  cache  size  M  and  number  of  documents  N,  the  function 
a  —>  MCl(Pq)  is  strictly  decreasing  on  [0,  oo). 


8.4  The  output  under  the  CLIMB  policy 


8.4.1  CLIMB  is  a  good  policy 


From  the  expression  (8.27),  for  each  i  —  1, . . . ,  N,  we  have 


mch(r,p)  =  E  Fcl  (s;p) 

seA;(M;A0 

i  M 


M-e+i 


Kcl 


sgAj(M;A/")  e=l 


and  by  Theorem  5.2, 


p'clW  =  ^  f.L  v  fi  pMm-w 

mCl(P)ACl  seAi(M-,Af) e=i 

for  each  i  —  1, . . . ,  N,  where  we  have  used  the  expression  (5.8). 


(8.33) 


(8.34) 
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1 


Figure  8.9:  CLIMB  miss  rate  when  the  IRM  input  has  a  Zipf-like  popularity  pmf  pa  for 
a  small  (0  <  a  <  1) 


Figure  8.10:  CLIMB  miss  rate  when  the  IRM  input  has  a  Zipf-like  popularity  pmf  pa 
for  a  large  (a  >  1) 
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Lemma  8.10  The  CLIMB  policy  is  a  good  policy. 


Proof.  The  proof  is  essentially  that  for  the  analogous  result  for  the  LRU  policy  given 
in  Lemma  8.3.  Here  the  validity  of  (8.15)  follows  from  the  expressions  (8.27).  ■ 


8.4.2  Counterexamples 

Again,  Corollary  6.8  and  Lemma  8.10  might  have  created  the  expectation  that  the  ma- 
jorization  comparison  p):L  -<  p  also  holds  under  the  CLIMB  policy  for  arbitrary  input 
pmf  p.  This  is  not  the  case  as  we  show  by  counterexamples  when  the  IRM  input  has 
the  popularity  pmf  p£  defined  at  (8.16).  Under  this  IRM  input,  it  is  a  simple  matter  to 
see  from  (8.33)  and  (8.34)  that  the  output  popularity  pmf  p*CLe  is  of  the  form  (8.17). 
Therefore,  by  Proposition  8.4,  the  comparison  p^L  e  -<  pe  will  not  hold  if  5(e)  > 

This  is  indeed  the  case  when  e  is  small  enough;  this  result  is  demonstrated  in  the  next 
theorem  whose  proof  can  be  found  in  Appendix  C.2. 


Theorem  8.11  Assume  the  IRM  input  to  have  the  popularity  pmf  p  for  some  0  <  e  < 
jf.  Under  the  CLIMB  policy,  whenever 


0  <  £  < 


1 

2N  -l 


(8.35) 


the  comparison  p*C]j  £  -<  p£  does  not  hold  provided  that  the  number  of  documents  N  and 
the  cache  size  M  satisfy  the  condition  N  >  M  >  2. 


For  instance,  consider  p£  with  parameters  IV  =  10  and  e  =  0.05  and  set  the  cache 
size  M  =  4.  With  these  parameters,  5(e)  =  0.1110  and  the  assumptions  of  Theorem 
8.11  are  satisfied.  Thus,  the  comparison  p*c Le  -<  p£  does  not  hold.  However,  as  was 
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found  in  the  case  of  the  LRU  policy,  the  entropy  comparison  is  valid  in  that  the  entropy 
of  p£  is  smaller  than  the  entropy  of  p(.L ,,  i.e., 

0.7283  =  H(p£)  <  H{pkc L)J  =  0.9560, 

suggesting  that  p^L  £  is  more  balanced  than  p  in  the  sense  of  entropy  comparison. 

We  next  give  counterexamples  when  the  IRM  input  has  Zipf-like  popularity  pmf 

(6.4) -(6.5).  Assume  M  =  3,  N  =  4  and  the  IRM  input  has  Zipf-like  popularity  pmf 

(6.4) -(6.5)  with  a  =  3.  With  these  parameters,  we  have  computed  the  output  popularity 
pmf  under  the  CLIMB  policy  using  (8.34).  The  numerical  values  of  both  input  and 
output  popularity  pmfs  are  presented  in  Table  8.2. 

Table  8.2:  pa  and  p*Ch  a  under  the  CLIMB  policy  when  the  IRM  input  has  a  Zipf-like 
popularity  pmf  pa  with  parameter  a  =  3 


i 

1 

2 

3 

4 

Pa 

0.8491 

0.1061 

0.0314 

0.0133 

P0L,a 

0.0027 

0.1386 

0.4000 

0.4587 

As  in  the  case  of  the  LRU  policy,  the  pmfs  pa  and  p£L  are  not  comparable  in  the 
majorization  ordering.  The  arguments  are  similar  to  the  one  given  for  the  LRU  policy, 
and  are  therefore  omitted.  Moreover,  a  result  analogous  to  Theorem  8.6  holds  for  the 
CLIMB  policy.  It  is  given  next,  with  a  proof  available  in  Appendix  B.4. 

Theorem  8.12  Assume  the  IRM  input  to  have  a  Zipf-like  popularity  pmf  pn  for  some 
a  >  0.  If  the  number  of  documents  N  and  the  cache  size  M  satisfy  the  condition  (8.26), 
then  under  the  CLIMB  policy,  there  exists  cU  =  o*(M,  N )  such  that  p(.L  Q  -<  pa  does 
not  hold  whenever  a  >  a*. 
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8.4.3  A  conjecture 

Here  as  well,  we  venture  that  a  conjecture  similar  to  Conjecture  8.7  is  also  valid  for  the 
CLIMB  policy  when  the  IRM  input  popularity  pmf  is  a  Zipf-like  distribution  (6.4)-(6.5). 

Conjecture  8.13  Assume  the  IRM  input  to  have  a  Zipf-like  popularity  pmf  pa  for  some 
a  >  0.  For  each  N  =  1,2,...,  under  the  CLIMB  policy,  there  exists  an  integer  M *  = 
M*(a;  N )  with  1  <  M *  <  N  such  that  p^L  o  -<  pa  whenever  M  =  1, . . . ,  M*. 

A  number  of  simulation  experiments  have  been  carried  out  under  the  CLIMB  policy, 
as  was  done  for  the  LRU  policy,  to  support  Conjecture  8.13.  The  discussion  of  the 
experimental  results  shown  in  Figure  8.11  and  8.12  is  similar  to  that  given  in  Section 

8.2.3  for  the  LRU  policy  and  shall  be  omitted. 


89 


(a)  a  =  0.8 


Figure  8.11:  CLIMB  output  popularity  pmf  with  different  cache  sizes  M  when  the  IRM 
input  has  a  Zipf-like  popularity  pmf  pa  with  (a)  a  =  0.8,  (b)  a  =  1  and  (c)  a  =  2. 
Documents  are  arranged  in  the  original  order  of  the  input  pmf  pQ. 
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(a)  a  =  0.8 


document  rank 


document  rank 


document  rank 


Figure  8.12:  CLIMB  output  popularity  pmf  with  different  cache  sizes  M  when  the  IRM 
input  has  a  Zipf-like  popularity  pmf  pa  with  (a)  a  =  0.8,  (b)  a  =  1  and  (c)  a  =  2. 
Documents  are  ranked  according  to  their  probabilities. 
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Chapter  9 


Comparing  Temporal  Correlations 


As  was  done  for  popularity,  it  is  natural  to  seek  an  appropriate  notion  which  can  capture 
the  strength  of  temporal  correlations  in  streams  of  requests.  Loosely  speaking,  temporal 
correlations  are  understood  as  the  likelihood  that  a  document  will  be  requested  in  the 
near  future,  given  that  it  has  been  requested  in  the  recent  past.  Indeed,  it  is  observed 
in  [56]  that  Web  traces  usually  exhibit  short-term  temporal  correlations  in  the  sense 
that  the  probability  of  requesting  a  particular  document  given  that  the  document  was 
recently  requested  is  higher  than  what  it  would  be  if  the  document  has  not  been  recently 
requested. 

In  this  chapter,  we  develop  a  notion  that  can  capture  the  strength  of  temporal  corre¬ 
lations  in  Web  request  streams  using  the  concepts  of  positive  dependence  introduced  in 
Chapter  3.  Specifically,  relying  on  the  notion  of  supermodular  ordering  [Definition  3.4], 
we  define  the  TC  ordering  [Definition  9.1]  for  comparing  two  streams  of  requests  on  the 
basis  of  the  strength  of  their  temporal  correlations. 

We  then  apply  the  TC  ordering  to  investigate  the  existence  of  temporal  correlations 
in  several  Web  request  models  that  are  believed  to  exhibit  such  correlations,  namely,  the 
higher-order  Markov  chain  model  (HOMM),  the  partial  Markov  chain  model  (PMM) 
and  the  Least-Recently-Used  stack  model  (LRUSM).  Lastly,  with  the  help  of  the  TC 
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ordering,  we  establish  a  version  of  the  statement  to  the  effect  that  “the  stronger  the 
strength  of  temporal  correlations,  the  smaller  the  miss  rate”  when  the  input  to  the  cache 
is  modeled  by  the  PMM.  Specific  results  and  conjectures  on  this  folk  theorem  when  the 
input  streams  are  modeled  by  the  HOMM  and  by  the  LRUSM  are  provided. 

9.1  Temporal  correlations  via  positive  dependence 

Given  a  stream  of  requests  R  =  {Rt,  t  =  0, 1, . . .},  we  define  for  each  i  =  1, . . . ,  N, 
the  rvs 

Vt{i)  =  l  [Rt  =  i\,  t  =  0,1,...,  (9.1) 

i.e.,  the  rv  V,  (i)  is  the  indicator  function  of  the  event  that  the  request  at  time  t  is  made  to 
document  i.  If  the  sequence  of  requests  { Rt ,  t  =  0,1,...}  were  to  exhibit  some  form 
of  temporal  correlations,  then  a  request  to  document  i  would  likely  be  followed  by  a 
burst  of  references  to  document  i  in  the  near  future.  This  corresponds  to  the  presence  of 
positive  dependencies  in  the  sequence  (Vj(i),  t  =  0, 1, . . .}  and  leads  naturally  to  the 
following  definition  of  Temporal  Correlations  ordering  ( TC  ordering ,  for  short): 

Definition  9.1  The  request  stream  R  =  { R)  ,  t  =  0, 1, . . .}  is  said  to  have  weaker 
temporal  correlations  than  the  request  stream  R2  =  { Rf ,  t  =  0, 1, . . .},  a  situation 
denoted 

Rl  <tc  R 2,  (9.2) 

if  for  each  i  —  1, . . . ,  N,  the  comparison 

t  =  0, 1, . . .}  <sm  (if  (i),  t  =  0, 1, . . .} 

holds  where  for  each  k  =  1,2,  thervs  {Vtk(i),  t  —  0, 1, . . .}  denote  the  indicator  process 
associated  with  R1  through  (9. 1 ). 
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Under  this  definition,  whenever  Rl  <tc  R2,  it  follows  from  the  equi-marginal 
property  (3.3)  of  the  sm  ordering  that 

P  [u/(i)  =  l]  -  P  [Vt2(i)  =  l]  ,  i  —  1, . . . ,  N, 

or  equivalently  that 

P  R]  —  i  =  P  Rf  =  i  ,  /  —  1 .... ,  A'.  (9.3) 

for  all  t  —  0, 1, . . ..  Therefore,  under  the  assumption  that  for  each  k  —  1,2,  the  limits 
(4.2)  exist  as  constants  for  the  request  stream  Rk,  we  have 

r  i  t  ,i 

pk(i )  =  E  lim  -Y 1  Rk  =  i 

1  v  '  t — >og  t  ,  L  1 

T—  1 

1  *  r 

=  linr  -  VP  i^  =  t  ,  i  =  1, . . . ,  N, 

by  the  Bounded  Convergence  Theorem.  Combining  this  last  equation  and  (9.3)  imme¬ 
diately  leads  to  p1  =  p2,  i.e.,  the  comparison  i?1  <TC  R2  requires  that  the  request 
streams  R1  and  R2  must  have  the  same  popularity  profile.  In  other  words,  the  TC  or¬ 
dering  captures  only  the  contribution  from  temporal  correlations  to  locality  of  reference. 

Proposition  9.2  For  a  request  stream  R,  if  each  of  the  indicator  processes  { Vt  (i),  t  = 
0, 1, . . .},  i  —  1, . . . ,  N,  associated  with  R  is  PSMD,  then  it  holds  that 

R  'Fitc  R 

where  R  is  the  independent  version  of  R. 

When  the  request  stream  R  is  a  stationary  sequence,  the  independent  version  R  of  R  is 
simply  the  IRM  whose  popularity  pmf  is  the  common  marginal  of  the  request  stream  R. 

Proof.  Fix  i  =  1, . . . ,  N.  Under  the  enforced  assumptions,  the  sequence  {Vt(i),  t  = 
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0, 1, . . .}  associated  with  R  is  PSMD.  This  amounts  to 


{Vt(i),  t  =  0, 1, . . .}  <sm  { Vt(i ),  t  =  0, 1, . . .} 


where  the  sequence  {Vt(i),  t  =  0,1,...}  is  the  independent  version  of  the  indicator 
sequence  {Vt(i),  t  =  0, 1, . . .}.  With  R  =  {Rt,  t  =  0, 1, . . .}  being  the  independent 
version  of  the  request  stream  R.  it  is  plain  that 


(Ve(*),  t  =  0,1,...}  =st  (1 


Rf  —  i 


t  =  Q, 1, ...},  i  —  1, 


,N, 


and  the  proof  is  completed. 


In  what  follows,  we  investigate  whether  various  request  models  of  interest  display 
temporal  correlations  in  the  sense  of  the  TC  ordering.  These  models  include  the  higher- 
order  Markov  chain  model,  the  partial  Markov  chain  model  and  the  Least-Recently- 
Used  stack  model. 


9.2  Higher-order  Markov  chain  models  (HOMM) 

Several  higher-order  Markov  chain  models  have  been  used  to  characterize  Web  request 
streams  (e.g.,  see  [19,  28,  56]  and  references  therein)  due  to  their  ability  to  capture  some 
of  the  observed  temporal  correlations.  Here  we  rely  on  a  model,  recently  proposed  by 
Psounis  et  al.  [56],  which  is  capable  of  capturing  both  the  long-term  popularity  and 
short-term  temporal  correlations  of  Web  request  streams. 

The  model  can  be  described  as  follows:  Let  A/”-valued  rvs  {R0, . . . ,  Rh-i}  be  the 
initial  requests  and  let  {Yt,t  =  0, 1, . . .}  be  a  sequence  of  i.i.d.  A/”-valued  rvs  with 
P  \Yt  —  i]  —  p(i )  for  each  i  —  1, . . . ,  N.  The  pmf  p  =  (p(  1), . . .  ,p(N ))  is  assumed  to 
be  admissible  (4.3)  and  as  we  shall  see  shortly,  it  will  turn  out  to  be  the  popularity  pmf 
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of  this  model.  Next,  with  0  <  on, . . . ,  ah  <  1  and  J2k=i  ak  <  1.  let  {Zt,  t  =  0, 1, . . .} 
be  another  sequence  of  i.i.d.  {0, 1, ,  /i}-valued  rvs  with 

h 

P  [Zt  —  k\  —  ak,  k  =  l,...,h  and  P  [Zt  =  0]  =  (3  =  1  -  ak  >  0, 

k= 1 

i.e.,  the  rv  Zt  is  distributed  according  to  the  pmf  ct  =  (0,a±, ,  ah).  The  collections  of 
rvs  {Rq,  . . . ,  Rh-i},  {Yt,  t  —  0, 1, . . .}  and  { Zt ,  t  =  0, 1, . . .}  are  mutually  independent. 
For  each  t  —  h,h+  1, . . the  request  is  described  by  the  evolution 

h 

Rt  =  1  [Zt  =  0]  Yt  +  X)  1  =  A:]  (9.4) 

fc=i 

In  words,  the  request  /f,  is  made  to  the  same  document  requested  at  time  t  —  k,  namely 
Rt-k ,  with  probability  for  some  k  —  1, . . . ,  h\  otherwise  Rt  =  Yt,  i.e.,  it  is  chosen 

independently  of  the  past  according  to  the  popularity  pmf  p. 

The  requests  {Rt,  t  =  0, 1, . . .}  form  an  hth- order  Markov  chain  since  the  value  of 
Rt  depends  only  on  the  rvs  Rt~i,  •  •  • ,  Rt-h ■  In  fact,  for  t  =  h,h  +  1, . . .,  we  have  from 
(9.4)  that  for  any  (i0, . . . ,  i*-i)  in  J\ft, 

h 

P[Rt  =  i\RT  =  iT,T  =  0,...,t-l\  =  (5p(i) -\-^jakl[it-k  =  i]  (9.5) 

k=l 

=  P  [R,t  —  i\Rr  =  iT,  t  —  t  —  h, . ... ,  t  —  1] . 

With  3  >  0,  this  //,/' -order  Markov  chain  is  irreducible  and  aperiodic  on  its  finite  state 
space;  its  stationary  distribution  exists  and  is  unique.  It  can  be  shown  [56]  that 

1  t 

lim  P  [Rt  —  i]  —  lim  -  Y'  1  [Rr  =  i]  =  p(i )  a.s. 

t — >oo  t — >oo  f  L ^ 

for  each  i  =  1, ...  ,N,  and  it  is  therefore  warranted  to  call  the  pmf  p  the  long-term 
popularity  pmf  of  this  request  model.  Moreover,  there  exists  a  unique  stationary  version, 
still  denoted  thereafter  by  {Rt,t  =  0, 1, . . .}.  The  parameters  of  the  model  are  the 
history  window  size  h,  the  pmf  ol  and  the  popularity  pmf  p,  and  we  shall  refer  to  this 
model  by  HOMM(7i,  ct,  p). 
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That  the  HOMM(/i,  ct,p)  exhibits  temporal  correlations  is  formalized  in  the  next 
result. 

Theorem  9.3  Assume  the  request  stream  R  =  {Rt,t  =  0, 1, . . .}  to  be  modeled  ac¬ 
cording  to  the  stationary  HOMM(h,  cx,p).  Then,  for  each  i  =  1 , ,N,  the  indicator 
sequence  {Vt(i),  t  —  0, 1, . . .}  associated  with  the  request  stream  R  is  PSMD,  whence 

R  — itc  R  (9.6) 

where  R  is  the  IRM  with  popularity  pmf  p. 

Proof.  In  order  to  show  that  the  sequences  {Vt(i),t  =  0, 1, . . .},  i  =  1, . . . ,  N  are 
PSMD,  we  shall  make  use  of  another  sequence  of  A/”- valued  rvs  R  =  {Rt,  t  —  0, 1, . . .} 
constructed  as  follows:  The  rvs  { R0 ■ . . . ,  Rh-i}  are  i.i.d.  rvs  distributed  according  to  the 
pmf  p  and  the  rvs  { /?, ,  t  —  h,h  +  1, . . .}  are  generated  through  the  evolution  (9.4)  with 
the  help  of  mutually  independent  sequences  of  i.i.d.  rvs  {Yt,  t  =  0,1,...}  and  {Zt,  t  = 
0, 1, . . .}  distributed  according  to  the  pmfs  p  and  ct,  respectively.  The  collections  of 
rvs  {Yt,  t  =  0, 1, . . .}  and  {Zt,  t  =  0, 1, . . .}  are  taken  to  be  independent  of  the  rvs 
{Ro, . . .  ,-Rh-i}.  From  this  construction,  the  process  R  =  {Rt,t  =  0, 1, . . .}  is  an 
hth- order  Markov  chain  and  with  (5  >  0,  we  get 

{Rt+r,  t  =  0, 1, . . .}  =*T  {Rt,  t  =  0,1,...}.  (9.7) 

Fix  i  =  Let  {Vt{i)  =  1  Rt  —  i  ,t  =  0, 1, . . .}  be  the  indicator  se¬ 

quence  associated  with  the  sequence  R  defined  earlier.  We  will  show  that  this  se¬ 
quence  {Vt(i),t  =  0, 1, . . .}  is  CIS.  To  do  so,  for  each  t  =  0, 1, . . .,  set  V* [i)  = 
(Vo(i), . . .  ,Vt{i)).  Because  the  sequence  {Vt(i),t  =  0, 1, . . .}  is  a  sequence  of  (0, 1}- 
valued  rvs,  it  is  CIS  [59,  67]  if  for  each  t  —  0, 1, . . .,  the  inequality 

P  [Vt+1(i)  =  l\v\i)  =  xt]<P  [Vt+i(i)  =  1\V\T>  =  yl\  (9.8) 
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holds  for  all  vectors  x*  =  (x0, . . . ,  xt)  and  y*  =  (y0, _ ,  yt)  in  {0,  l}t+1  with  xl  <  yl 

componentwise. 

For  t  =  0, 1, . . . ,  h  —  2,  it  holds  for  all  xl  =  (x0, . . . ,  xt)  in  {0,  l}t+1  that 

P  [Pt+1(i)  =  1|  V\i)  =  xf\  =  P  [Vt+1(i)  =  l]  =  P  [Rt+1  =i]=  p(i)  (9.9) 

by  independence  of  the  rvs  R0, . . . ,  Rh-i,  and  the  inequality  (9.8)  is  obtained  for  each 
t  —  0, 1, . . . ,  h  —  2.  Next,  for  t  —  h  ~  1,  h, . . .,  and  xt  =  (x0, . . xt)  in  (0,  l}t+1,  let 
(i0, . . . ,  it)  be  an  element  in  A/”m  with  the  property  that  for  each  k  —  0, . ... . ,  t,  ik  —  i  if 
Xk  —  1  and  ^  /'  if  a;/,.  =  0.  With  such  an  element,  we  obtain  from  (9.5)  that 

P  Vt+i  (i)  =  lK^o,  ...,Rt)  =  (*o,  •••,**) 

P  i I  (f?o,  •  •  •  j  (io,  •  •  •  j  ^t) 

h 

=  (3p(i)  +  “fc1  [*t+i-fc  =  *] 

k=  1 
h 

=  /3p(i)  +  k-  (9.10) 

fe=i 

Since  (9.10)  holds  for  any  (i0, ...  ,it)  in  J\fl+ 1  satisfying  the  property  above,  a  standard 
preconditioning  argument  readily  yields 

r~  ~t  i  h 

P  yt+1(i)  =  i|v  (i)  =  x*]  =  pP(i)  +  J2akXt+i-k.  (9.11) 

fc=l 

This  last  expression  being  monotone  increasing  in  xl  =  (xo, . . .  ,xt),  we  obtain  the 
inequality  (9.8)  for  each  t  —  h  —  1,  h, _ 

Thus,  the  inequalities  (9.8)  hold  for  all  t  —  0, 1, . . ..  This  implies  that  the  sequence 
{Vt(i),  t  —  0, 1, . . .}  is  CIS,  whence  indeed  PSMD  by  Theorem  3.10,  i.e., 

{Vt{i),  t  =  0, 1, . . .}  <sm  {Vt(i),  t  =  0, 1, . . .}  (9.12) 

where  {Vt(i),  t  —  0, 1, . . .}  is  the  independent  version  of  {Vt(i),t  =  0,1,.. .}.  Now, 
recalling  (9.7),  it  is  plain  that 

{Vt+rW,  t  =  0,1,...}  =>r  t  =  0,1,...}  (9.13) 
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where  {Vt(i),  t  =  0, 1, . . .}  is  a  sequence  of  i.i.d.  {0,  l}-valuedrvs  with  P  |Vq(?)  —  lj  = 
p(i)  and  is  exactly  the  independent  version  of  {Vt(i),t  =  0, 1, . . .}.  By  invoking  the  fact 
that  the  sm  ordering  is  closed  under  weak  convergence  [52,  Thm.  3.9.8,  p.  116],  we 
conclude  from  (9.7),  (9.12)  and  (9.13)  that 


(Vt(i),  t  =  o,  i, . . .}  <sm  {yt(i),  t  =  o,  i,  . . .}. 


Therefore,  the  sequence  {Vt(i),t  =  0, 1, . . .}  is  PSMD  for  each  i  =  1, . . . ,  N,  and  by 
Proposition  9.2,  the  comparison  R  <tc  R  holds  with  R  being  the  independent  version 
of  R.  m 


9.3  Partial  Markov  chain  models  (PMM) 

The  partial  Markov  chain  model  was  introduced  early  on  in  the  literature  as  a  reference 
model  for  computer  memory  paging  [2].  It  is  a  subclass  of  higher-order  Markov  chain 
models  and  corresponds  to  HOMM(/),,a,p)  with  parameter  h  =  1.  In  that  case,  we 
have  a.  =  a i)  where  a\  =  1  —  /3  and  we  refer  to  this  model  as  PMM(/7.  p). 

Under  this  model,  with  probability  1  —  (3,  Rt  =  Rt- 1,  otherwise  with  probability 
/ 3 ,  Rt  =  Yt,  i.e.,  Rt  is  drawn  independently  of  the  past  according  to  the  popularity  pmf 
p.  Therefore,  it  is  natural  to  expect  that  when  the  popularity  pmf  p  is  held  fixed,  the 
smaller  the  value  of  correlation  parameter  (3,  the  greater  temporal  correlations  exhibited 
by  the  PMM(  3,  p).  In  the  extreme  cases,  as  (3  ]  1,  the  PMM (  7,  p)  becomes  the  IRM 
with  popularity  pmf  p  and  there  is  no  temporal  correlations.  On  the  other  hand,  as  [3  |  0, 
all  the  requests  are  made  to  the  same  document,  hence  displaying  the  strongest  possible 
form  of  temporal  correlations.  The  following  result,  which  contains  Theorem  9.3  when 
h  =  1,  formalizes  these  statements  with  the  help  of  the  TC  ordering,  thereby  confirming 
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the  intuition  that  the  parameter  (3  of  PMM(/3,  p)  is  indeed  a  measure  of  the  strength  of 
temporal  correlations. 

Theorem  9.4  Assume  that  for  each  k  =  1,2,  the  request  stream  Rl3k  =  { If/1' ,  t  = 
0, 1, . . .}  is  modeled  according  to  the  stationary  PMM(/3k,  p).  IfO  <  @2  <  /3i,  then 

Rlh  <TC  R 132  ■  (9.14) 

The  proof  of  this  theorem  relies  on  the  following  comparison  of  Markov  chains 
under  the  supermodular  ordering  due  to  Bauerle  [8]. 

Theorem  9.5  Let  X  =  {Xt,  t  —  0, 1, . . .}  and  X'  =  {X't,  t  —  0, 1, . . .}  be  two  station¬ 
ary  Markov  chains  on  (0, 1, ... ,  n)  with  transition  matrices  P  and  P' ,  respectively.  For 
7o,  •  •  • ,  7n  >  0  with  0  <  E”=o  Ij  —  define  the  (n  +  1)  x  (n  +  1)  matrix 

1-E^o7 j  7i  •••  7n 

7o  1  -  Ejyi  7 j  ■■■  7n 

7o  71  ...  1  -  Ejyn  7i 

With  P  =  Q( 70, . . . ,  7n)  and  P;  =  Q(cy0,  •  •  • ,  C7n)  for  some  0  <  c  <  1,  it  holds  that 

X  <sm  X'. 

Proof  of  Theorem  9.4.  Fix  i  =  1, . . . ,  Ar.  Given  a  sequence  R  =  { /!)' ,  1  = 
0, 1, . . .}  modeled  according  to  the  PMM(/5,p),  it  follows  from  (9.11)  that  the  sequence 
{Vf(i),t  =  0, 1, . . .}  associated  with  R3  is  a  Markov  chain  on  (0, 1}  with 

P  [E+i  (*)  =  !|  Vf(i)  =xt,...,  V£{i)  =  x0]  =  (3p(i)  +  (1  -  p)xt,  t  =  0, 1, ... , 
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for  any  (x0> . . .  ,xt)  in  {0,  l}t+1.  Its  transition  matrix  P,3{i )  is  simply  given  by 

1  -  Pp(i)  /3p(i) 

0(1  ~p(i))  1-0(1  ~p(i)) 
or  equivalently,  in  the  notation  (9.15),  P^(i)  =  Q(7o,7i)  where  70  =  0(  1  —  pit))  and 
7i  =  Pp(i)  with  0  <  70  +  71  =  0  <  1. 

For  two  stationary  PMM  request  streams  R^1  and  R 1  with  0  <  02  <  0i,  we  can 
always  write  02  =  c0i  with  0  <  c  =  ^  <  1.  Thus,  the  sequences  {Vf]  (i),t  =  0, 1, . . .} 
and  {Vt(i),  t  =  0, 1, . . .}  have  transition  matrices 

P  (*)  =  Q (7o ,  7i )  and  -P'32  (*)  =  Q (c7o ,  C71 )  * 

respectively,  with  70  =  0i(l  —  p(i)),  71  =  @ip(i)  and  c  =  |^.  By  applying  Theorem 
9.5,  we  obtain  the  comparison 

(Vf  0)0  =  0, 1, . . .}  <sm  Of2(i)0  =  0, 1, . . .} 

for  each  i  =  1, . . . ,  N,  and  the  conclusion  (9.14)  follows  upon  recalling  Definition  9.1 
of  the  TC  ordering.  ■ 


9.4  Least-Recently-Used  stack  models  (LRUSM) 

The  Least-Recently-Used  stack  model  (LRUSM)  has  long  been  known  to  be  a  good 
model  for  generating  the  sequence  of  requests  whose  statistical  properties  match  those 
of  observed  reference  streams  [24,  61].  We  first  state  the  definition  and  basic  properties 
of  the  LRUSM,  and  then  show  that  under  some  appropriate  assumptions  on  the  model, 
the  LRUSM  exhibits  stronger  strength  of  temporal  correlations  than  its  independent 
version  in  the  TC  ordering. 
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9.4.1  LRU  stack  and  stack  distance 

We  begin  with  the  notion  of  LRU  stack  and  stack  distance.  For  each  t  =  0, 1, . . the 
stack  Llt  =  (A(l),  •  •  • ,  Llt{N))  is  defined  as  an  element  in  A(N]Af),  i.e.,  Llt  is  an 
ordered  sequence  of  the  documents  {1, . . . ,  N}.  It  is  customary  to  assume  that  0(1)  is 
in  the  top  position  of  the  stack,  followed  by  Llt(2), . . Llt(N),  in  that  order. 

Given  an  initial  stack  f20  in  A(7V;Ar),  with  any  stream  of  requests  R  =  { /?, ,  t  = 
0, 1, . . .},  we  can  associate  a  stack  sequence  {Llt,t  =  0,1,...}  through  the  following 
recursive  mechanism:  For  each  t  =  0, 1, . . .,  let  Dt  denotes  the  position  of  the  document 
Rt+ 1  in  the  stack  Llt,  i.e.,  the  rv  Dt  is  the  unique  element  of  (1, . . . ,  N}  such  that 

Llt(Dt)  —  Rt+i- 

The  stack  Llt+ i  is  then  given  by 

a  (A)  if  k  =  1 

tot+i(k)  =  nt(k-  1)  if  k  =  2,...,  Dt  (9.16) 

Llt(k)  if  k  —  Dt  +  1, . . . ,  N. 

In  words,  the  document  Llt(Dt )  =  Rt+i  is  moved  up  to  the  highest  position  (i.e.,  po¬ 
sition  1)  in  the  stack  f2f+i  at  time  t  +  1  and  the  documents  A(l),  •  •  • ,  Llt(Dt  —  1)  are 
shifted  down  by  one  position  while  the  documents  Llt(Dt  +  1), . . . ,  Llt(N)  remain  un¬ 
changed.  We  refer  to  the  rvs  {Dt,  t  =  0, 1, . . .}  so  defined  as  the  stack  distance  sequence 
associated  with  the  request  stream  R. 

Conversely,  given  the  initial  stack  Q0  in  A (IV;  AC),  with  any  sequence  of  Ar}- 

valued  rvs  {Dt,  t  =  0, 1, . . .},  we  can  use  the  stack  operation  (9.16)  to  generate  a  se¬ 
quence  of  A  (AT;  A/”) -valued  rvs  {Llt,  t  —  0, 1, . . .}.  A  request  stream  R  is  readily  gener¬ 
ated  from  this  stack  sequence  by  reading  off  the  top  of  the  stack,  i.e.,  with  R0  =  Q(l(l), 
we  have 

Rt+i  =  ^t(Dt)  =  A+i(l),  t  —  0,1, ... .  (9.17) 
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Note  that  the  rvs  {Dt,t  =  0, 1, . . .}  constitute  the  stack  distance  sequence  associated 
with  the  request  stream  R  defined  at  (9.17). 

The  stack  and  stack  distance  introduced  above  are  often  referred  to  as  LRU  stack 
and  stack  distance,  respectively,  in  reference  to  the  popular  LRU  policy.  The  dynamics 
of  the  LRU  policy  are  best  described  through  the  notion  of  LRU  stack  and  stack  distance 
as  we  now  briefly  explain:  Returning  to  (9.16),  we  see  that  the  stack  Qt  at  time  t  ranks 
the  documents  according  to  their  recency  of  reference  with  the  most  recently  requested 
document  remaining  at  the  highest  stack  position.  For  each  k  —  1, . . . ,  N,  the  document 
0,t(k)  at  position  k  in  the  stack  Qt  is  the  kth  most  recently  referenced  document  at  time 
t,  hence  the  name,  LRU  stack.  Consequently,  the  documents  f^(l), . . . ,  £lt(M)  in  the 
first  M  positions  of  the  stack  Qt  simply  yield  the  documents  in  cache  under  the  LRU 
policy  with  cache  size  M  when  the  requests  R0, . . . ,  Rt  have  already  been  served,  i.e., 
St+ 1  =  (Ut(l), . . .  ,n,t(M)}  where  St+ 1  is  the  LRU  cache  at  time  t  +  1.  With  this 
observation  in  mind,  a  miss  of  the  LRU  cache  of  size  M  will  occur  at  time  t  +  1  if 
Dt>  M  and  thus  the  miss  rate  (4.8)  under  the  LRU  policy  can  alternatively  be  given  by 
the  limit 

1  t~i 

Mlrv(R)  =  lim  -  £  1  [Dr  >  M]  a.s.  (9.18) 

t~MX  1  T= 0 

whenever  the  limit  exists. 

9.4.2  The  LRU  stack  model 

The  duality  between  streams  of  requests  and  stack  distances  embedded  in  (9.16)  can 
be  used  to  advantage  in  defining  sequences  of  requests  with  temporal  correlations.  We 
present  one  of  the  simplest  ways  to  do  just  that:  The  Least-Recently-Used  stack  model 
(LRUSM)  with  pmf  a  on  A/”  is  defined  as  the  request  stream  Ra  =  (i?“,  t  =  0,1,...} 
whose  stack  distance  sequence  {Dt,  t  —  0, 1, . . .}  is  a  collection  of  i.i.d.  rvs  distributed 
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according  to  the  pmf  a,  i.e., 


P  [Dt  =  k\  =  ak,  k  =  1, . . . ,  N;  t  =  0, 1, . . . , 

given  some  arbitrary  initial  stack  Q0  in  A(N;J\f).  Throughout  we  assume  that  the  rv  Q(l 
is  independent  of  the  stack  distances  {Dt,  t  —  0, 1, . . In  fact,  provided  a at  >  0,  when 
the  initial  stack  rv  O0  is  uniformly  distributed  over  A(N;J\f),  the  stack  rvs  {Q/  ,  t  = 
0, 1, . . .}  form  a  stationary  sequence,  and  so  do  the  request  rvs  {i?f,  t  —  0, 1, . . .}.  This 
fact  is  established  in  the  process  of  proving  Proposition  9.6  in  Appendix  D.l.  We  shall 
denote  this  request  model  by  LRUSM(a). 

From  (9.18),  the  miss  rate  of  the  LRUSM(a)  under  the  LRU  policy  with  cache  size 
M  is  simply 

N 

MhRV(Ra)  =  P  [Dt  >  M]  =  ]T  ak  (9.19) 

k=M+ 1 

by  the  Strong  Law  of  Large  Number.  The  LRU  policy  is  known  to  be  an  optimal  policy 
for  the  LRUSM(a)  in  the  sense  that  the  LRU  policy  minimizes  the  miss  rate  of  the 
request  stream  Ra  over  the  class  of  replacement  policies  (4.5)  if  the  stack  distance  pmf 
a  satisfies  the  LRU  optimality  condition  [58] 

N 

(N  —  k)ak  >  ^2  cij,  k  —  1, . . . ,  N.  (9.20) 

j=k+ 1 

The  popularity  pmf  of  the  LRUSM  is  discussed  first  in  Proposition  9.6;  its  proof  can 
be  found  in  Appendix  D.L 

Proposition  9.6  Assume  the  request  stream  Ra  =  (i?“,  t  =  0, 1, . . .}  to  be  modeled 
according  to  the  LRUSM(a).  If  a N  >  0,  then  for  each  i  =  1. ....  N ,  it  holds  that 

Pail)  =  ,lim  J  E  1  K  =  *]  =  T7  a'S ■  (9-21) 

t-*oo  t  7^7  iv 
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Thus,  under  LRUSM,  as  every  document  is  equally  popular,  locality  of  reference  is 
expressed  solely  through  temporal  correlations  with  no  contribution  from  the  popularity 
of  documents.  This  was  found  to  be  a  drawback  of  the  LRUSM  for  characterizing  Web 
request  streams  and  several  variants  of  this  model  have  been  proposed  to  accommodate 
this  shortcoming  [4,  14,  18]. 

9.4.3  Temporal  correlations  in  LRUSM 

As  was  done  with  the  HOMM,  we  show  that  the  TC  ordering  also  captures  the  strength 
of  temporal  correlations  exhibited  by  the  LRUSM.  Recall  the  sequence  of  indicator  func¬ 
tions  {Vta(i)  =  1  [Rf  =  i\ ,  t  =  0, 1, . . .},  i  =  1, . . . ,  N,  associated  with  the  LRUSM 
request  stream  {Rf,  t  —  0, 1, . . .}.  The  main  result  is  contained  in 

Theorem  9.7  Assume  the  request  stream  Ra  =  {/?“,£  =  0,1,...}  to  be  modeled 
according  to  the  LRUSM(a)  with  stack  distance  pmf  a  satisfying 

di  >  a2  >  ■  ■  ■  >  aw  >  0.  (9.22) 

Then,  for  each  i  =  1, . . . ,  N,  the  indicator  sequence  {Vta(i),t  =  0, 1, . . .}  associated 
with  the  request  stream  R°  is  CIS,  whence 

Ra  <TC  Ra  (9.23) 

where  R°  is  the  independent  version  of  R°  . 

A  proof  of  Theorem  9.7  can  be  found  in  Appendix  D.2.  In  view  of  Proposition  9.6, 
when  the  LRUSM  request  stream  Ra  is  stationary,  its  independent  version  R°  is  simply 
the  IRM  with  uniform  popularity  pmf  u  =  (-^, . . . ,  In  fact,  it  is  not  hard  to  see  that 
the  stationary  LRUSM(rr)  indeed  coincides  with  the  IRM  with  uniform  popularity  pmf 
u.  Notice  that  the  condition  (9.22)  for  the  LRUSM(a)  to  exhibit  temporal  correlations 
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in  the  sense  of  the  TC  ordering  (9.23)  does  imply  the  LRU  optimality  condition  (9.20). 
This  confirms  the  intuition  that  the  LRU  policy  is  designed  to  work  best  with  the  stream 
that  exhibits  temporal  correlations  amongst  its  requests. 

9.5  Folk  theorem  on  miss  rates 

With  the  help  of  the  TC  ordering,  we  can  now  use  the  results  of  Theorems  9.3,  9.4  and 
9.7  to  explore  the  folk  theorem  to  the  effect  that  the  stronger  the  strength  of  temporal 
correlations,  the  smaller  the  miss  rate  under  the  PMM,  the  HOMM  and  the  LRUSM,  re¬ 
spectively.  Specific  results  and  conjectures  are  provided  next  for  the  PMM,  the  HOMM 
and  the  LRUSM,  respectively. 

9.5.1  PMM 

The  miss  rates  of  PMM  under  demand-driven  cache  replacement  policies  have  been 
previously  considered  in  [2].  For  particular  caching  policies  such  as  LRU  and  FIFO,  the 
miss  rate  under  PMMf/L  p)  is  shown  to  be  proportional  to  the  miss  rate  of  the  IRM  with 
the  same  popularity  pmf  p.  We  first  demonstrate  this  fact  in  some  generality  and  then 
use  it  to  compare  the  miss  rates  of  two  PMM  streams  with  different  strength  of  temporal 
correlations. 

As  we  seek  to  evaluate  the  limit  (4.8)  for  the  PMM(/3,  p)  under  the  cache  replace¬ 
ment  policy  7r,  we  shall  need  the  following  definitions:  For  each  T  =  1,2,...,  define 

A(T)  =  ■£  1  [Z«  =  0] 

t=  1 

as  the  number  of  times  from  time  1  up  to  time  T  that  the  requests  are  chosen  indepen¬ 
dently  of  the  past  according  to  the  popularity  pmf  p.  Also,  for  each  k  =  1,  2, . . .,  let 
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7 (k)  =  inf {t  =  1,2, :  A (t)  =  k}.  Under  demand-driven  caching  with  the  PMM  in¬ 
put,  a  miss  can  only  occur  at  the  time  epochs  'y(k)  (k  —  1,2,.. .)  at  which  point  we  have 
=  U7(fc).  Therefore,  it  follows  from  the  definition  of  the  rvs  {7 (k),  k  =  1,  2, . . .} 

that 


t  s 


7(fc) 


T  A  (T) 

Eifam]  =  E1 

t= 1  fc=i 

A(T) 

—  ^  ^7(fc)  ^  E(fc) 

fc=l 

and  the  miss  rate  under  PMMf/i,  p)  is  given  by 


T  =  1,2,..., 


M^RO)  =  Jim  -  E 1  [H?  i  S, 


=  lim 

T— >  00 


'Mry 

T 


A  (T) 


■ME  jfe 


E  1  E(*0  £  & 


7(A) 


(9.24) 


By  the  Strong  Law  of  Large  Numbers,  we  see  that  the  limit  of  the  first  term  in  (9.24) 
is  simply 

Am  i  JL 

(9.25) 


X(T)  1  T 

lim  —  =  lim  —  1  [Zt  =  0]  =  (3  a.s. 

T— >00  T  T—>oo  T  “  L 


The  limit  of  the  second  term  in  (9.24)  in  general  does  not  necessarily  have  a  closed- 
form  expression.  However,  It  does  admit  a  simple  expression  in  the  special  case  when 
the  cache  replacement  policy  7 r  satisfies  the  following  condition: 


(*)  For  all  t  —  1.2....,  if  Rt  =  Rt-i,  then  the  cache  state  and  eviction  rule  at  time 
t  +  1  is  the  same  as  those  at  time  t,  i.e.,  £lt+i  —  E  and  Ut+ 1  =  Ut. 


Under  this  condition,  we  can  write  the  second  limit  as 


!  a(t)  1  K 

lim  ,  ,  ™  ^  1  [El  *9  ^  E(fc)j  —  ^  1  IM7W  ^  Ew 


A(T) 


— >00  iE 

=  mt(p) 


fc=i 


(9.26) 


where  Mn(p)  is  the  miss  rate  of  the  IRM  with  popularity  pmf  p  under  the  policy  n. 
The  last  equality  follows  from  the  fact  that  the  rvs  {TUfe),  k  =  1,2,...}  form  an  IRM 
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with  popularity  pmf  p  and  that  by  Condition  (*),  the  cache  sets  {S7(k),  k  =  1,  2, . . .} 
are  similar  to  the  cache  sets  under  the  policy  n  when  the  input  is  the  IRM  sequence 
{F7(fc),  k  —  1,2,.. Combining  (9.24),  (9.25)  and  (9.26)  yields  the  expression  for  the 
miss  rate  of  PMM(/9,p)  as 

M7T(R13)  =  (3  ■  Mn(p).  (9.27) 

Condition  (*)  is  satisfied  by  many  cache  replacement  policies  of  interest,  e.g.,  the  policy 
A0,  the  LRU,  FIFO  and  random  policies  but  not  by  the  CLIMB  policy.  Equipped  with 
the  expression  (9.27),  we  can  now  conclude  to  the  following  monotonicity  result. 

Theorem  9.8  Assume  that  the  cache  replacement  policy  n  satisfies  Condition  (*)  and 
that  for  each  k  =  1,2,  the  request  stream  Rh  is  modeled  according  to  PMM(/3k,  pk).  If 
p1  =  p2  and  0  <  fa  <  (3i,  then  it  holds  that 

AC (R^ )  <  Mn (R131 ) .  (9.28) 

Moreover,  if  the  mapping  p  — >  Mn(p)  is  Schur-concave,  then  whenever  p1  -<  p2  and 
0  <  fa  <  Pi,  the  comparison  (9.28)  also  holds. 

In  view  of  Theorem  9.4,  we  conclude  that  the  folk  theorem  on  the  miss  rate  indeed 
holds  for  the  PMM  under  any  cache  replacement  policy  which  satisfies  Condition  (*). 

9.5.2  HOMM 

Consider  the  following  situation:  Let  R  be  HOMM(/i,  ct,  p)  for  some  pmf  vectors  p  on 
A f  and  a.  on  (0, ... ,  h}.  For  some  0  <  c  <  1,  let  Rc  denote  HOMM(7i,  ctc,  p)  where  of 
is  obtained  from  a  by  taking  ark  =  cak  for  each  k  =  1, . . . ,  h,  and  (3C  =  1  —  c(l  —  (3)  = 
f3  +  (1  —  c)(l  —  (3).  Obviously,  /3C  >  (3  while  oick  <  ak  for  each  k  —  1, . . . ,  h.  In  other 
words,  under  HOMMf  /i.  a.  p),  there  is  a  smaller  probability  to  generate  a  new  request 
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independently  of  past  requests  than  under  HOMM(7i,  a'  ,  p ).  Therefore,  in  an  attempt  to 
generalize  Theorem  9.3,  it  is  reasonable  to  think  that  HOMM(/i,  cU,  p)  has  less  temporal 
correlations  than  HOMM(/i,  (y.p)  according  to  the  TC  ordering,  i.e.,  Rc  <tc  R-  Tak¬ 
ing  our  cue  from  Theorem  9.8,  we  would  then  expect  the  inequality  MW(R)  <  M~(R) 
to  hold  for  some  good  caching  policies.  We  summarize  these  expectations  as  the  fol¬ 
lowing  conjecture: 

Conjecture  9.9  Assume  the  request  stream  R  =  {Rt,t  =  0, 1, . . .}  to  be  modeled 
according  to  HOMM(h,  ct,p).  For  some  0  <  c  <  1,  if  the  request  stream  Rc  = 
{Rt,  t  =  0, 1, . . .}  is  modeled  according  to  HOMM(h,  ac,p)  with  olc  =  (1  —  c(l  — 
(3),  co  ] , . . . ,  CO'/,  ) ,  then  the  comparison  Rc  <rc  R  holds.  Furthermore,  under  some 
appropriate  cache  replacement  policy  it,  it  holds  that  Mn(R)  <  M7T(RC). 

Establishing  this  conjecture  appears  to  be  much  more  difficult  than  for  the  PMM, 
and  requires  further  investigation.  However,  in  support  of  this  conjecture,  we  have 
carried  out  several  experiments  under  the  LRU  policy  when  the  input  to  the  cache  is 
modeled  according  to  the  HOMM.  Throughout,  we  fix  N  =  100  and  let  the  input 
popularity  pmf  p  be  the  Zipf-like  distribution  pa  (6.4)-(6.5)  with  parameter  a  =  0.8. 
We  consider  five  different  classes  of  HOMM,  each  with  different  history  window  size 
h  —  1, . . . ,  5.  In  each  class,  the  input  stream  R  (with  0  <  ft  <  1),  is  generated 
according  to  HOMM(/i,  oth(/3),pa)  with  a.h(/3)  =  (/3,  . . . ,  ^).  The  validity  of 

Conjecture  9.9  would  require  that  the  mapping  (3  — >  MLRU(i?3)  be  increasing. 

From  Figure  9.1,  the  miss  rate  is  indeed  found  to  be  increasing  as  the  parameter  /3 
increases  for  all  cases  and  for  all  cache  sizes.  When  h  =  1,  HOMM  reduces  to  PMM 
and  the  results  here  confirm  the  validity  of  the  expression  (9.27)  and  of  Theorem  9.8.  It 
is  interesting  to  note  that  for  a  given  cache  size  M,  the  miss  rates  of  all  HOMM  input 
streams  with  h  <  M  are  the  same  as  the  miss  rate  of  the  PMM.  This  suggests  some 
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form  of  insensitivity  of  the  LRU  miss  rate  under  the  HOMM  to  the  history  window  size 
h  and  to  the  pmf  a.  Lastly,  for  all  cases  and  for  all  cache  sizes,  the  miss  rate  always 


goes  to  0  as  /3  goes  to  0.  This  is  due  to  the  fact  that  lim^oo  P 
R°  denotes  the  HOMMU, «/, (Oj.p,,).1 


dO  _  dO 

Kt  —  Rt-i 


=  1  where 


9.5.3  LRUSM 

According  to  Theorem  9.7,  the  stationary  LRUSM(a)  with  stack  distance  pmf  a  satis¬ 
fying  condition  (9.22)  has  stronger  strength  of  temporal  correlations  than  the  stationary 
LRUSM('u).  In  the  vein  of  Theorem  9.4,  it  is  then  natural  to  wonder  when  does  the 
LRUSM(b)  have  weaker  temporal  correlations  than  the  LRUSM(a)  for  pmf  b  not  nec¬ 
essarily  uniform.  Theorem  9.7  suggests  that  this  could  happen  when  the  pmf  a  is  more 
skewed  toward  the  smaller  values  of  stack  distance  than  the  pmf  b.  To  capture  the  skew¬ 
ness  in  the  pmf  vectors,  we  recall  the  notion  of  majorization  introduced  in  Chapter  2  and 
note  that  for  any  pmf  a  on  J\f,  it  holds  that  u  -<  a.  With  majorization,  we  can  now  state 
the  following  conjecture. 

Conjecture  9.10  Consider  request  streams  R°  and  Rb  which  are  modeled  according 
to  the  stationary  LRUSM(a)  and  LRUSM(b),  respectively.  If  both  pmfs  a  and  b  satisfy 
(9.22)  with  b  -<  a,  then  the  comparison  Rb  <tc  R°  holds. 

When  both  pmfs  a  and  b  satisfy  (9.22),  the  conditions  (2.1)-(2.2)  for  the  majorization 
comparison  b  -<  a  to  hold  reduce  to 

n  n 

n  =  1, . . . ,  N  -  1.  (9.29) 

i= 1  i= 1 

'indeed,  if  R  is  modeled  according  to  the  HOMM(/i,  o , pn )  with  {3  =  0,  then  it  can  be  shown  that 
lirnt _ ,oc  P  [Rt  =  Rt_ i]  =  1  provided  that  the  hth- order  Markov  chain  {Rt,t  =  0, 1, . . .}  is  aperiodic. 


Ill 


This  condition  is  a  possible  formalization  of  the  statement  that  the  pmf  a  is  more  skewed 
toward  the  smaller  values  of  stack  distance  than  the  pmf  b.2 

To  glean  evidence  in  favor  of  Conjecture  9.10,  we  consider  the  LRU  policy  and 
recall  that  the  miss  rate  under  the  LRU  policy  with  cache  size  M  for  the  LRUSM(a) 
is  given  by  (9.19).  Combining  (9.19)  and  (9.29),  we  conclude  that  for  two  LRUSM 
request  streams  Ra  and  Rb  satisfying  the  conditions  of  Conjecture  9.10,  it  holds  that 
MLRu(-Ra)  <  MLRu(-Rb)-  This  is  of  course  the  desired  inequality  expressing  the  folk 
theorem  for  miss  rates  under  the  LRU  policy  which  would  be  expected  if  Conjecture 
9.10  were  to  hold. 


2The  condition  (9.29)  is  equivalent  to  the  usual  stochastic  ordering  between  the  stack  distance  rvs  l)f' 
and  Dt  associated  with  the  request  streams  Ra  and  Rb,  respectively,  where  Df  <st  Db ■ 
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Chapter  10 


The  Working  Set  Model 


In  the  last  two  chapters,  we  show  how  comparisons  in  the  majorization  ordering  of  popu¬ 
larity  and  in  the  TC  ordering  of  temporal  correlations  can  be  translated  into  comparisons 
of  some  well-known  metrics,  namely,  the  working  set  size,  the  inter-reference  time  and 
the  stack  distance.  In  this  chapter,  we  discuss  results  for  the  working  set  model  and  some 
folk  theorems  under  its  companion  memory  management  policy,  the  so-called  Working 
Set  algorithm. 

10.1  Definition 

The  working  set  model  was  introduced  by  Denning  [26]  and  some  of  its  properties  are 
discussed  in  [27].  It  can  be  defined  as  follows:  Consider  a  request  stream  R  =  {Rt,  t  = 

0, 1, . . .}.  Fix  t  —  0,1, _ For  each  r  =  1,  2, . . .,  we  define  the  working  set  W (t,  r;  R ) 

of  length  r  at  time  t  to  be  the  set  of  distinct  documents  occurring  amongst  the  past 
t  consecutive  requests  i?(t_r+1)+, . . . ,  Rt.1  The  size  of  the  working  set  W (t,  r;  R)  is 
denoted  by  S(t,  r;  R).  Under  some  appropriate  conditions  on  the  request  stream  R.  it 
holds  that  S(t,  t;  R)  =U  5(r;  R)  where  S(t;  R)  is  the  steady  state  working  set  size 
'For  any  x  £  R,  we  set  (x)+  =  max(0,  x ). 
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of  length  r.  The  rv  5(r;  R )  can  be  viewed  as  the  number  of  distinct  documents  in  r 
consecutive  requests  in  the  steady  state. 

A  basic  quantity  of  interest  associated  with  the  working  set  size  is  its  long-run  aver¬ 
age  defined  by 

~  I  T-l 

S(t;R)—  lim  —  V]  S(t,  r;  R)  a.s.  (10.1) 

t=Q 

for  each  r  —  1,  2, . . ..  In  the  next  lemma,  we  identify  conditions  on  the  request  stream 
R  for  the  existence  of  these  limits  (10.1),  in  the  process  making  a  connection  between 
the  limits  (10.1)  and  the  steady  state  working  set  sizes. 

Lemma  10.1  Assume  the  request  stream  R  =  {Rt,t  =  0, 1, . . .}  to  couple  with  a 
stationary  sequence  of  AT -valued  rvs  R  =  {Rt,t  =  0, 1, . . .}.  Then,  the  a.s.  limits 
(10.1)  exist  and  it  holds  that 2 

S(t,  r;  R)  =>t  S(t ;  R),  r  =  1,2,....  (10.2) 

If,  in  addition,  the  sequence  R  is  ergodic,  then 

S(t;  R)  =  E  [S{t;  R)] ,  r  =  1,2,....  (10.3) 


A  proof  of  Lemma  10.1  can  be  found  in  Appendix  E.l.  A  special  case  of  Lemma 
10.1  occurs  when  the  request  stream  R  itself  is  stationary.  In  that  case,  the  distribution 
of  S(t,  t ;  R)  does  not  depend  on  t  when  t  >r  —  1,  i.e.,  for  each  r  =  1,  2, . . .,  we  have 

S(t,r;R)  =st  S(t  -  1,t;R),  t  =  t,  r  +  1, . . . .  (10.4) 

Therefore,  (10.2)  automatically  holds.  Furthermore,  if  the  request  stream  R  is  stationary 
and  ergodic,  then  (10.3)  is  also  obtained. 

2In  fact,  (10.2)  holds  under  the  weaker  assumption  that  the  request  stream  R  =  { Rt ,  t  =  0, 1, . . .} 
is  asymptotically  stationary  in  that  {Rt+t,  t  =  0, 1, . . .}  {Rt,t  =  0,1,...}  with  R  =  {Rt,  t  = 
0, 1, . . .}  being  a  stationary  sequence  of  .A/- valued  rvs. 
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10.2  The  effect  of  popularity 


Assume  the  request  stream  R  =  {R,  t  —  0, 1, . . .}  to  be  the  IRM  with  popularity  pmf  p. 
Under  these  enforced  i.i.d.  assumption,  the  request  stream  R  is  stationary  and  ergodic, 
and  from  (10.4),  we  obtain 

S(t]  R)  =st  S{t  -  1,  r;  R)  =  •  •  • ,  Rr- 1}|-  (10-5) 


Since  the  IRM  request  stream  R  is  characterized  solely  by  its  popularity  pmf  p.  the 
pmf  of  S(t]R)  clearly  depends  only  on  the  pmf  p  and  we  shall  recognize  this  fact 
by  denoting  the  working  set  size  of  length  r  of  the  IRM  by  S(r;p).  Similarly,  we 
let  S(t',p)  denote  the  average  working  set  size  (10.1)  of  length  r  of  the  IRM  request 
stream. 

For  positive  integer  n  =  1,  2, . . .  and  pmf  9  =  (0(1), . . . ,  9(N))  on  (1, . . . ,  N}, 
imagine  the  following  experimental  setup:  An  experiment  has  N  distinct  outcomes, 
outcome  i  occurring  with  probability  O(i)  (i  =  1. ... .  N ).  We  carry  out  this  experiment 
n  times  under  independent  and  statistically  identical  conditions.  Let  Xj(n.  9)  denote 
the  number  of  times  that  outcome  i  occurs  amongst  these  n  trials  ( i  =  1 , ,N).  These 
N  rvs  are  organized  into  an  IX  x  - valued  rv  X(n,9 )  known  as  the  multinomial  rv  with 
parameters  n  and  9.  Its  distribution  is  given  by 


P  [X(n,9)  =  x 


\ 


y  xii  ■  ■  • ,  %n 


J 


N 


n«w 


whenever  the  integer  components  (xi, . . . ,  xn)  of  x  satisfy  Xi  >  0  (i  =  1, . . . ,  N)  and 

E,;=i  Xi  =  n. 

With  X (n.  9),  we  can  associate  the  rv  K(n,  9)  given  by 

N 

K(n,  9)  :=Y,l[Xi(n,9)  >  0] ; 
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this  rv  records  the  number  of  distinct  outcomes  that  occur  amongst  the  n  trials.  The  fol¬ 
lowing  result  was  established  by  Wong  and  Yue  [72]  and  deals  with  the  Schur-concavity 
of  the  tails  probabilities 

7 Ti(n,  0)  :=  P  [K{n1  0)  >  i\ ,  £  =  1,2,..,,  min(TV,  n). 

Theorem  10.2  For  each  n  —  1,2,...  and  each  t  =  1,2,...,  min  (TV,  n),  the  mapping 
0  — >  " i  (n,  0)  is  Schur-concave. 

From  (10.5),  the  working  set  size  S(t:  p)  of  the  IRM  request  stream  with  popularity 
pmf  p  is  simply  the  number  of  distinct  outcomes  K (r,  p)  for  the  multinomial  rv  with 
parameters  r  and  p.  Thus,  by  combining  Theorem  10.2  with  the  basic  fact  (3.2)  on  the 
usual  stochastic  ordering,  we  get  the  following  corollary. 

Corollary  10.3  For  admissible  pmfs  p  and  q  on  M,  it  holds  that 

S(r;q)  <st  S(r;p),  r  =  1,2,...,  (10.6) 

whenever  p  -<  q. 

In  words,  the  more  skewed  the  popularity  pmf,  the  stronger  the  locality  of  reference  in 
the  IRM,  and  the  smaller  (in  the  strong  stochastic  sense)  the  working  set  size,  in  line 
with  one’s  intuition! 

A  simple  consequence  of  Corollary  10.3  is  the  comparisons  of  the  average  working 
set  sizes,  namely 

S(r;q)  <  S(r;p),  r  =  1,2,..., 

provided  p  -<  q.  This  is  due  to  the  facts  that  the  comparisons  (10.6)  imply 

E[F(r;q)]  <  E[F(r;p)]  ,  r  =  l,2,..., 

and  that  under  the  IRM,  Lemma  10.1  yields  S(r;p)  =  E  [S(t;  p)]  for  all  r  =  1,  2, . . .. 
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10.3  The  effect  of  temporal  correlations 


As  for  popularity,  it  is  expected  that  the  stronger  the  strength  of  temporal  correlations 
in  the  stream  of  requests,  the  smaller  the  working  set  size.  We  wish  to  formalize  this 
statement  as  was  done  for  popularity  in  Corollary  10.3.  However,  with  the  help  of  the 
TC  ordering,  we  obtain  only  the  comparison  of  the  expectations  of  the  working  set  sizes. 

Theorem  10.4  For  two  request  streams  Rl  =  {Ft],  t  =  0, 1, . . .}  and  R2  =  {R2,  t  = 
0, 1, . . .},  if  R1  <tc  R2,  then  for  each  t  —  0, 1, . . .,  it  holds  that 

E  [S(t,  r;  .R2)]  <  E  [S(t,  r;  i?1)]  ,  r  =  l,2,....  (10.7) 

A  proof  of  this  theorem  relies  on  the  fact  that  the  rv  S(t,  r;  R)  can  be  expressed  as  a 
combination  of  supermodular  functions  of  the  indicator  sequences  {Vt(i),t  =  0,1,...}, 
i  —  1, . . . ,  N,  associated  with  the  request  stream  R.  Before  giving  a  proof,  we  note  the 
following  lemma  [7,  Lemma  2.1]. 

Lemma  10.5  If  the  mapping  u>  :  ]Rr  — >■  IR  is  given  by 

T 

i’i'x)  =  nv{xi)i  X  =  (xi, . . .  ,xT)  e  RT  (10.8) 

1=1 

for  some  monotone  mapping  ip*  :  D  !  — » 1R,  then  fp  is  supermodular. 

Proof  of  Theorem  10.4.  Fix  t  =  0,1,...  and  r  —  1, _ ,  t  +  1.  The  working  set  size 

S(t,r;R)  of  length  r  at  time  t  for  the  request  stream  R  can  be  expressed  in  terms  of 
the  corresponding  indicator  sequences  {Vt(i),t  =  0, 1, . . .},  i  =  1, . . . ,  N,  as  follows: 
From  the  definition  of  S(t.  r:  R).  we  can  write 

N 

S(t,  t;  R)  =  J2  1  *  e  {R(t-T+ 1)+ 

1=1 
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N 

—  5Z  i  [*  e  {Rt-r+i,  ■  ■  ■ ,  Rt}] 

i=l 

N 

1=1 

N  T—l 

=  £(i  -  n 1  [«<-'#  *i) 

i= 1  1= 0 

N  T—l 

=  E(1-  IK1  -1{R,-I  =  i})) 

i= 1  1=0 

N  T—l 

=  e<!-  nu-^w))- 

i= 1  1= 0 

N 

=  £(i  -ii>(vt-r+1(i),...,vt{i)))  (io.9) 

i= 1 

where  the  mapping  ^  :  ]Rr  — >  ]R  is  of  the  form  (10.8)  with  mapping  ip*  :  1R  — >  1R  given 
by 

ip*(x)  =  1  —  x,  x  G  H.  (10.10) 

By  Lemma  10.5,  the  mapping  tp  is  supermodular  since  ip*  defined  at  (10.10)  is  mono¬ 
tone. 

Equipped  with  the  expressions  (10.8)-(10.10),  we  are  now  ready  to  prove  Theorem 
10.4.  Recall  that  for  any  two  request  streams  R 1  and  R2  such  that  R1  <TC  R2.  we  have 
the  comparison  {Vp(i),t  =  0, 1, . . .}  <sm  {Vt2(i),t  =  0, 1, . . .}  for  each  i  —  1, . . . ,  N. 
From  the  supermodularity  of  ip  and  the  definition  of  the  sm  ordering,  it  then  follows  that 

E  [V>(V,‘_r+1(i), ....  V.'W)]  <  E  [<p(Vt2_r+1{i), ....  V,2(i))]  (10.11) 

for  alii  =  1, . . . ,  N.  Combining  inequalities  (10.11)  with  (10.9)  yields  the  comparison 
(10.7)  for  each  r  =  1, . . . ,  t  +  1.  Upon  noting  that  for  all  r  >  t  +  1, 

S(t,r;Rk)  =  S(t,t+l]Rk),  k  =  1,2, 

we  get  the  desired  comparisons  (10.7)  for  all  r  =  1,  2, . . ..  ■ 
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Corollary  10.6  Assume  that  for  each  k  =  1,2,  the  request  stream  Rk  =  {Rk,t  = 
0, 1, . . .}  couples  with  a  stationary  sequence  of  J\f -valued  rvs  R  =  {Rk,  t  —  0, 1, . . .}. 
If  R2  <Tc  R2,  then  it  holds  that 

E  [S(t;  i?2)J  <  E  [S(t-  -R1)]  ,  r  =  l,2,...,  (10.12) 

where  for  each  k  —  1,2,  S(t:  Rk )  is  the  steady  state  working  set  size  of  the  request 
,  ~  1  ~2 

stream  R  .  In  addition ,  if  R  and  R  are  stationary  and  ergodic ,  then  it  holds  that 

S(r;R2)  <  S(r;Rl)t  r  =  1,2,...,  (10.13) 

where  for  each  k  =  1,2,  S(r:  R1 )  is  the  average  working  set  size  of  the  request  stream 

Rk. 

Proof.  Fix  r  =  1,2, .. .  and  k  —  1,2.  Under  the  assumptions  above,  Lemma  10.1 
already  yields  the  convergence 

S(t,  r;  Rk)  ==>t  S(t]  Rk).  (10.14) 

Next,  because  S(t,  r;  Rk)  <  N  for  every  t  —  0, 1, . . .,  the  sequence  {S(t,  r;  Rk),t  = 
0, 1, . . .}  is  uniformly  integrable.  Combining  this  fact  with  (10.14),  it  follows  from  [11, 
Thm.  5.4,  p.  32]  that 

lim  E  \S(t,  t ;  Rk) ]  =  E  \S(t]  ^fc)l  .  (10.15) 

Invoking  (10.7)  and  (10.15),  we  obtain  the  steady  state  comparisons  (10.12).  The 
comparisons  (10.13)  for  the  average  working  set  sizes  follow  from  (10.12)  under  the 
additional  ergodicity  assumption  of  the  coupling  processes  associated  with  R1  and  R2. 
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Corollary  10.6  demonstrates  that  for  a  request  stream  R  exhibiting  temporal  corre¬ 
lations,  the  independent  version  R  of  R  can  be  used  to  provide  various  performance 
bounds,  which  in  turn  can  be  used  for  cache  dimensioning  associated  with  the  request 
stream  R.  We  illustrate  this  argument  with  three  request  models,  namely  the  HOMM, 
PMM  and  LRUSM  request  streams,  with  the  help  of  Theorems  9.3,  9.4  and  9.7,  respec¬ 
tively.  Upon  noting  that  the  stationary  HOMM  and  PMM  are  ergodic  Markov  chains, 
we  obtain 


Corollary  10.7  Assume  the  request  stream  R  =  {Rt,t  =  0, 1, . . .}  to  be  modeled 
according  to  the  stationary  HOMM(h,  ct,p)  with  admissible  popularity  pmf  p.  Then,  it 
holds  that 

S(t;R)<S(t;R),  t  =  1,2,..., 

where  R  is  the  IRM  with  popularity  pmf  p. 

Corollary  10.8  Assume  that  for  each  k  =  1,  2,  the  request  stream  _RA  =  { B,k ,  t  = 
0,1,...}  is  modeled  according  to  the  stationary  PMM(/3k ,  p)  with  admissible  popularity 
pmf  p.  IfO  <  (32  <  Pi,  then  it  holds  that 

S(t;  R132)  <  S(t;  R01),  t  =  1,2,.... 


Lastly,  we  note  the  comparison  of  the  working  set  sizes  under  the  LRUSM. 


Corollary  10.9  Assume  the  request  stream  Ra  =  {/?“,  t  —  0, 1, . . .}  to  be  modeled  ac¬ 
cording  to  the  stationary  LRUSM(a)  with  stack  distance  pmf  a  satisfying  (9.22).  Then, 
it  holds  that 


E[5(r;i?a)]  <  E  S(r-Ra } 


r  =  1,2,..., 


where  R°  is  the  IRM  with  uniform  popularity  pmf  u. 
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10.4  The  Working  Set  algorithm 


Fix  r  =  1,2, . . ..  The  Working  Set  (WS)  algorithm  with  length  r  is  the  algorithm 
that  maintains  the  previous  r  consecutive  requested  documents  . . . ,  Rt~\  in  the 

cache  St  at  time  t.  In  other  words,  the  cache  St  is  simply  the  working  set  W(t  —  1,  r;  R) 
with  the  convention  W{— 1,t;R)  =  (f).  This  algorithm  differs  from  other  demand- 
driven  caching  policies  in  that  the  number  of  documents  in  the  cache  may  change  over 
time  while  demand-driven  caching  policies  have  a  fixed  cache  size  M  (as  soon  as  each 
document  has  been  called  at  least  once).  The  number  of  documents  in  the  cache  at  time  t 
under  the  WS  algorithm  is  basically  the  number  of  distinct  documents  in  W(t  —  1,  t;  R) 
which  is  the  working  set  size  S(t  —  1,  r;  R). 

The  operation  of  the  WS  algorithm  can  be  described  as  follows:  For  each  t  = 
0, 1, . . let  Qt  be  the  state  of  the  cache  at  time  t  defined  by 

fit  =  (-R(£_r)+  ,  •  •  • ,  Rf—  i )  • 

It  is  easy  to  see  from  this  definition  that  the  cache  state  Qt+ i  is  completely  determined 
by  the  previous  cache  state  Qt  and  the  current  request  Rt.  Furthermore,  the  cache  set  St 
can  be  recovered  from  by  taking 

St  =  {i  =  1, . . . ,  N  :  i  e  flt}  —  W(t  —  1,  t;  R),  t  —  0,1,.... 

For  t  >  r,  regardless  of  a  cache  miss,  the  WS  algorithm  will  evict  the  document  Rt~r  if 
Rt-r  ^  W ( t ,  r;  R)  and  does  not  evict  any  document,  otherwise. 

The  miss  rate  of  the  WS  algorithm  with  length  r  can  be  defined  in  the  same  way  as 
in  the  case  of  demand-driven  caching;  it  is  given  by  the  a.s.  limit 

1  T 

MWs{R )  =  lim  —  [Rt  &  St]  a.s. 

T^°°  T  t= i 

1  T 

=  lim  l[Rti  Wit  -  1,  r;  R)]  a.s.  (10.16) 

T^oo  1 
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We  next  explore  the  folk  theorems  for  miss  rates  and  for  output  streams  under  the  WS 
algorithm.  We  do  so  for  both  the  IRM  input  stream  and  general  input  stream  exhibiting 
temporal  correlations,  respectively. 

10.4.1  Under  the  IRM 

We  first  assume  the  input  to  the  cache  to  be  modeled  according  the  IRM  with  popularity 
pmf  p.  Under  this  assumption,  we  show  that  the  folk  theorems  for  the  miss  rate  and 
the  output  of  a  cache  under  the  WS  algorithm  do  not  hold  in  general.  This  comes  as 
no  surprise  since  the  WS  algorithm  is  a  close  cousin  of  the  LRU  policy  in  that  the  LRU 
policy  of  cache  size  M  can  be  obtained  from  the  WS  algorithm  that  keeps  the  M  most 
recent  distinct  documents  in  the  cache  by  varying  its  length  r. 

Miss  rate  of  WS  algorithm 

It  is  known  [2,  27]  that  the  miss  rate  Mws  (p)  of  the  WS  algorithm  with  length  r  under 
the  IRM  with  popularity  pmf  p  is  given  by 

N 

Mws{p)  =  ^p(i){l  -p(^Y-  (10.17) 

i=  1 

Unfortunately,  the  miss  rate  function  MWg(p)  is  not  Schur-concave  i n  p  lor  r  =  2,  3, . . .. 
However,  it  is  Schur-concave  only  when  r  =  1  in  which  case  the  WS  algorithm  coin¬ 
cides  with  any  demand-driven  caching  policy  of  cache  size  M  —  1.  These  results  are 
contained  in 

Theorem  10.10  Assume  the  input  to  be  modeled  according  to  the  IRM  with  popularity 
pmf  p.  The  miss  rate  function  MWs  (p)  under  the  WS  algorithm  with  length  r  is  Schur- 
concave  in  the  pmf  p  when  r  =  1  and  is  not  Schur-concave  in  the  pmf  p  when  r  = 
2,3,.... 
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Proof.  For  each  r  =  1,  2, . . the  miss  rate  function  Mws(p)  in  (10.17)  is  of  the  form 

N 

Mws(p)  =  Y,9r{p{i)) 

i=  1 

where  the  mapping  gT  :  [0, 1]  — >  [0, 0.25]  is  given  by  x  — >  x(l  —  x)T.  As  we  note  from 
[49,  3.C.1,  p.  64  and  3.C.1.C,  p.  67],  the  function  Mws (p)  is  Schur-concave  if  and  only 
if  the  mapping  gT  is  concave.  It  is  now  a  simple  matter  to  check  that  the  mapping  gT 
is  concave  only  when  r  =  1  and  not  concave  when  r  =  2,3,...,  whence  the  desired 
result.  ■ 


Output  of  WS  algorithm 


By  restricting  the  input  streams  to  be  in  the  class  of  IRM,  the  output  of  the  WS  algorithm 
with  length  r  can  be  analyzed  along  the  same  lines  as  Theorem  5.2  for  demand-driven 
caching  policies.  Indeed,  for  the  IRM  with  popularity  pmf  p ,  the  output  popularity  pmf 
Pyv s  under  the  WS  algorithm  with  length  r  is  given  by 


Pws(*) 


p{i){\  -  p(i)Y 

Ef=lPm-p(j)r  ' 


(10.18) 


As  for  the  case  of  miss  rate,  the  folk  theorem  for  the  output  that  p(vs  -<  p  does  not 
hold  when  r  =  2,  3, . . .,  but  does  hold  only  for  r  =  1  in  which  case  the  WS  algorithm 
reduces  to  any  demand-driven  caching  policy  with  cache  size  M  —  1.  The  counterexam¬ 
ples  when  r  =  2,  3, . . . ,  are  given  below  where  the  IRM  input  has  a  Zipf-like  popularity 
pmf  with  large  a. 


Theorem  10.11  Assume  the  input  to  be  modeled  according  to  the  IRM  with  Zipf-like 
popularity  pmf  pQ  for  some  a  >  0  .If  the  number  of  documents  N  and  the  length  r  of 
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the  WS  algorithm  satisfy  the  condition 


N  <  2T~1  with  r>  1,  (10.19) 

then  under  the  WS  algorithm,  there  exists  a*  =  o*(t,  N)  such  that  p^ So  -<  pa  does 
not  hold  for  a  >  a*. 


A  proof  of  this  theorem  is  given  in  Appendix  B.5. 


10.4.2  Miss  rate  under  input  with  temporal  correlations 

Given  an  input  stream  R  =  {Rt,t  =  0, 1, . . .},  let  {Vt(i),t  =  0, 1, . . i  =  1, . . . ,  N, 
be  the  indicator  sequences  (9.1)  associated  with  it.  Recall  from  (10.16)  that  a  miss 
occurs  at  time  t  when  the  document  11,  is  not  in  the  working  set  W(t  —  1,  r;  R).  Thus, 
the  indicator  function  for  the  miss  event  at  time  t  >  r  can  be  written  as 

1  [Rt  (f  W(t  —  1,  t;  R)]  =  l[Rt  {Rt-T,  ■  ■  ■  i  Rt~i}\ 

N 

=  ^l[Rt  =  i]l[i(£{Rt-T),..,Rt-1}]  (10.20) 

i= 1 

N  r 

=  e  1  [r, = i]  n 1  «] 

i= i  e=i 

N  T 

=  EKwnti-^w) 

i= 1  1=  1 

N 

=  ^g(Vt-T(i),..,,Vt(i))  (10.21) 

i= 1 

where  we  have  set 

T—  1 

g(x0,...,xT)  =  xTY[(l-xe),  (x0, . . .  ,xT)  e  RT+1.  (10.22) 

t=o 

Combining  (10.16),  (10.21)  and  (10.22)  yields  the  miss  rate  under  the  WS  algorithm  as 
the  limit 

MWS(R)  =  lim  W(t-l,r;R)\ 

T^°°  T  t= i 
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lim 

T— XX) 


T-t  +  1 
T 


T  N 


T_  xt 

T  ~r  1  ^_T 


i  T+r— 1  AT 

=  Tlin^^  J29(Vt-T(i),---,Vt(i))  a.s. 

t=r  i=  1 


(10.23) 


and  if  the  request  stream  i?  admits  some  form  of  ergodicity,  then  the  limit  (10.23)  exists. 
A  condition  for  the  existence  of  the  limit  (10.23)  is  given  in  the  next  lemma  whose  proof 
is  available  in  Appendix  E.2. 


Lemma  10.12  Fix  r  =  1,2,....  Assume  the  request  stream  R  =  {Rt,  t  —  0, 1, . . .}  to 
couple  with  a  stationary  and  ergodic  sequence  of  J\f -valued rvs  R  =  {Rt,  t  —  0, 1, . . .}. 
Then,  the  a.s.  limit  (10.23)  exists  and  is  given  by 

N 

MWS(R )  =  lim  ^2  E  [g(Vt-T(i), . . .  ,Vt(i))]  a.s.  (10.24) 

t — XX  - 

1=1 

In  particular,  if  R  is  stationary  and  ergodic,  then 

N 

MWS(R)  =  [-frr  =  i,Re  7^  M  =  0, . . .  ,t  —  1] .  (10.25) 

i=  1 


To  establish  the  folk  theorem  to  the  effect  that  the  stronger  the  temporal  correlations, 
the  smaller  the  miss  rate,  we  need  to  show  that 

Mws(-R2)  <  Mws(i?1)  whenever  R1  <Tc  R2 •  (10.26) 

Therefore,  upon  recalling  the  definitions  of  the  TC  and  sm  orderings,  we  see  that  estab¬ 
lishing  (10.26)  amounts  to  showing  that  the  mapping  g  given  in  (10.22)  is  submodular.3 
Unfortunately,  the  mapping  g  is  not  submodular  in  general;  only  in  the  special  case 
r  =  1  is  g  a  submodular  function.  We  shall  discuss  these  issues  by  first  showing  the 
3A  function  Lp  :  R"  — >  R  is  said  to  be  submodular  if  —tp  is  supermodular. 
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positive  result  when  r  =  1  and  then  providing  counterexamples  using  the  PMM  when 
r  >  1. 


[r  =  1]  -  When  r  =  1,  we  note  that  S(t  —  1,  r;  R)  =  1  for  all  t  =  1,  2, . . and 
the  WS  algorithm  coincides  with  any  demand-driven  caching  policy  having  cache  size 
M  =  1.  In  that  case,  the  only  document  in  the  cache  at  time  t  is  the  document  Rt_\ 
and  a  miss  occurs  when  Rt  Rt-i-  The  folk  theorem  holds  in  this  special  case  for  all 
demand-driven  caching  policies. 


Theorem  10.13  Consider  an  arbitrary  demand-driven  replacement  policy  7 r  with  M  = 
1.  If  the  request  streams  R1  and  R2  satisfy  the  relation  Rl  <tc  R2,  then  it  holds  that 


Rt  £  s 2 


<  p 


RUSl],  t  —  1,2, 


(10.27) 


Proof.  For  each  t  —  1,  2, . . .,  we  have  from  (10.21)-(10.22)  that 

1  [RtiSt\  =  1  [Rt^Rt-i] 

N 

=  EsW-.W 

i—  1 

with  the  mapping  g  :  1R2  — >•  1R  being  given  by 

g(x0,x  1)  =  x1*~x0xi,  (x0,  xi)  e  1R2. 

Because  the  mapping  (x0,  x±)  — >  x0xi  is  supermodular,  the  mapping  (x0,  x i)  — >■  —  x0xi 
is  submodular.  The  mapping  (xfh  X\ )  — >■  xx  being  submodular,  the  mapping  g  is  there¬ 
fore  submodular  since  the  sum  of  two  submodular  functions  is  still  a  submodular  func¬ 
tion. 

Given  two  request  streams  R]  and  R2  such  that  R  <rc  R2,  we  recall  the  compar¬ 
isons  {Vt{i),t  =  0, 1, . . .}  <sm  {V2{i),t  =  0, 1, . . .}  for  each  i  =  1, . . . ,  N.  Thus  by 
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the  definition  of  the  sm  ordering,  we  obtain  for  each  t  —  1,2,..., 


N 


R2,t%  =  EE  JW- iM.ifW) 


i= 1 
N 


<  EE  sW-.M.V.'M) 


i=l 


^  3 


Corollary  10.14  Consider  an  arbitrary  demand-driven  replacement  policy  n  with  M  = 
1.  If  the  request  streams  Rl  and  R2  couple  with  stationary  and  ergodic  sequences  of 
M -valued  rvs  R  and  R  .  respectively,  and  satisfy  the  relation  R1  <TC  R2,  then  it 
holds  that 

MWS(R2)  <  Mws (R1)- 


Proof.  Under  the  assumptions  above,  the  miss  rate  of  the  request  stream  Rk  for  each 
k  —  1,2,  can  be  obtained  using  Lemma  10.12  and  is  given  by 


Mws(ir)  =  lim  P 

t — >oo 


Rkt  i  skt 


a.s. 


The  desired  result  is  now  immediate  from  (10.27). 


[r  >  1]  -  The  folk  theorem  (10.26)  does  not  necessarily  hold  when  r  >  1  as  we  now 
demonstrate  via  counterexamples  when  the  PMM  is  taken  to  be  the  input  to  the  cache. 
The  miss  rate  of  the  WS  algorithm  with  length  r  for  PMM(/T  p )  [2]  is  given  by 

N 

Mws{f3,p )  =/?^p(i)(l-p(i))(l-/?p(i))r~1.  (10.28) 

i— 1 
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From  Section  9.3,  we  would  expect  that  as  the  strength  of  temporal  correlations  in¬ 
creases,  i.e.,  the  value  of  the  parameter  3  decreases,  the  miss  rate  MWS(/3,  p)  should 
be  decreasing.  To  put  it  differently,  the  mapping  3  — >  MWs  (3,P)  should  be  increasing 
when  the  popularity  pmf  p  is  held  fixed. 

However,  this  is  not  always  the  case  as  we  show  in  the  counterexamples  where  the 
PMM  stream  is  assumed  to  have  the  uniform  popularity  pmf  u  =  ...  ,j^). 

Theorem  10.15  Fix  r  =  2,  3, . . .,  and  assume  the  input  to  be  modeled  according  to 
PMM (3.  u).  Under  the  WS  algorithm  with  length  r,  the  miss  rate  function  Mws(/5,  u) 
given  in  (10.28)  is  increasing  in  3  when  3  <  -f  and  decreasing  in  3  when  3  >  -f. 

Thus,  the  folk  theorem  always  holds  when  the  length  r  of  the  WS  algorithm  is  smaller 
than  the  number  of  documents  N  but  may  fail  to  hold  otherwise. 

Proof.  When  the  PMM  has  the  uniform  popularity  pmf  u,  the  expression  (10.28)  for 
the  miss  rate  under  the  WS  algorithm  becomes 


Differentiating  this  expression  with  respect  to  3  yields 


Thus,  the  miss  rate  function  Mws(/3,  u)  is  increasing  when  1  —  >  0,  or  equivalently, 

3  <  y ,  and  is  decreasing  when  1  —  ^  <  0,  or  equivalently,  3  >  rr-  ■ 
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Chapter  11 


Inter-reference  Time  and  Stack  Distance 


In  this  chapter,  we  continue  the  program  announced  in  Chapter  10  as  we  seek  the  appro¬ 
priate  comparisons  for  the  inter-reference  times  and  the  stack  distances  when  the  request 
streams  are  comparable  in  either  the  majorization  or  the  TC  orderings. 

11.1  Inter-reference  time 

The  notion  of  inter-reference  time  in  the  stream  of  requests  has  recently  received  some 
attention  as  a  way  of  characterizing  temporal  correlations  [34,  40,  53]. 

First  a  definition.  Given  a  request  stream  R  =  {Rt,t  =  0, 1, . . .},  for  each  t  = 
0, 1, . . .,  we  define  the  inter-reference  time  T(t;R )  as  the  rv  given  by 

T(t-R )  :=  inf{r  =  1,2 ,...,t:Rt  =  Rt-r}  (11-1) 

with  the  convention  that  T(t;R)  =  t  +  1  if  Rt~r  ^  Rt  for  all  r  =  1, . . . ,  t.  As 
for  the  working  set  size,  under  some  appropriate  conditions  on  the  request  stream  R. 
T(t ;  R)  =>t  T(R')  where  the  steady  state  inter-reference  time  T(R)  describes  the  time 
between  two  consecutive  requests  for  the  same  document.  One  such  condition  is  given 
in 
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Lemma  11.1  Assume  the  request  stream  R  =  {Rt,t  =  0, 1, . . .}  to  be  asymptoti¬ 
cally  stationary,  i.e.,  {Rt+i,  t  =  0, 1, . . .}  =>e  {Rt,  t  =  0, 1, . . .}  with  R  =  { Rt ,  t  = 
0, 1, . . .}  being  a  stationary  sequence  of  Af -valued  rvs.  Then,  it  holds  that 

T(t;R)=>tT(R).  (11.2) 


A  proof  of  Lemma  11.1  is  given  in  Appendix  E.3.  Lastly,  we  note  that  if  the  request 
stream  R  is  stationary  and  ergodic,  then  the  pmf  of  the  steady  state  inter-reference  time 
T(R)  is  given  by  the  limits 

P  [ T(R )  =  k]  =  iim  i  1  int  R )  =  k\  a.s.,  k  =  1,2,.... 

11.1.1  The  effect  of  popularity 

We  first  study  the  effect  of  popularity  on  the  inter-reference  time  by  assuming  the  request 
stream  R  to  be  the  IRM  with  popularity  pmf  p.  Under  the  IRM,  the  request  stream  R  is 
stationary  and  ergodic  in  which  case  (1 1.2)  holds.  In  fact,  T(R)  can  be  represented  by 

T(R)  =st  inf =  1,2 Rt  =  R0}  (11.3) 

since  the  i.i.d.  process  {Rt,t  =  0, 1, . . .}  is  reversible.  The  main  comparison  for  the 
steady  state  inter-reference  times  is  given  in  terms  of  the  convex  ordering. 

Theorem  11.2  Assume  that  request  streams  R]  and  R2  are  modeled  according  to  the 
IRM  with  admissible  popularity  pmfs  p1  and  p2,  respectively.  Then,  it  holds  that 

T(R 1)  <CXT(R2)  (11.4) 

whenever  p1  A  p2. 
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Thus,  the  more  skewed  the  popularity  pmf,  the  stronger  the  locality  of  reference  in  the 
IRM,  and  the  more  variable  the  inter-reference  time  in  that  (1 1.4)  implies  E  T(R1)  = 
E  T{R2)  and  Var(T(R1))  <  Var(T{R2)).  This  can  be  explained  by  observing  that 
a  document  with  high  probability  of  request  is  likely  to  be  requested  again  in  the  near 
future,  leading  to  smaller  values  for  T(R)  and  correspondingly  larger  deviation  from  its 
mean. 


Proof.  It  is  well  known  [59,  Thm.  2.A.1,  p.  57]  that  the  comparison  (1 1.4)  between  the 
{1,  2, . .  ,}-valued  rvs  T(R1)  and  T(R2)  is  equivalent  to 

OO  OO 

£  P  [TiR1)  >  r]  <  £  P  [T{R2)  >  t]  (11.5) 

r=n  r=n 

for  all  n  —  1,2,.. .,  with 

E  [TiR1)]  =  E  [T{R2)]  .  (11.6) 

Consider  an  IRM  request  stream  R  with  popularity  pmf  p  and  fix  i  —  1, . N.  By 
using  the  representation  (11.3),  we  note  that 

P  [T(R)  =  t\R0  =  i]  =  p(i)(  1  -  p(i))T-\  r  =  1,2,..., 

i.e.,  conditional  on  R0  =  i,  the  inter-reference  time  T(R)  is  geometrically  distributed 
with  parameter  p(i).  Consequently,  for  each  n  —  0, 1, . . .,  we  find 

OO 

P[T(p)>n\R0  =  i\  =  £  P[T(p)=r\R0  =  i] 

r=n-\- 1 

=  (i-peor, 


whence 


Next,  we  obtain 


N 

P  \T{p)  >  n]  =  £p(i)(l  ~p{i))n. 

i=  1 
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In  particular,  with  n  =  0,  this  last  calculation  yields 


E  [T(B)1  =  yp  {T(R)  >t}  =  N , 

T  — 0 

and  this  independently  of  p\  In  other  words,  (1 1.6)  holds. 

It  is  a  simple  matter  to  see  that  for  each  n  =  1,  2, . . .,  the  mapping  t  — ■>  (1  —  t)n 
is  convex  on  R+.  By  a  classical  result  of  Schur  [49,  C.l,  p.  64],  the  mapping  x  — > 
E^i(l  —  Xi)n  is  a  Schur-convex  function  on  D  i  v.  To  put  it  differently,  the  mapping 
p  — >  'ipnip)  is  Schur-convex,  and  (11.5)  indeed  holds  when  p 1  A  P2-  ■ 


11.1.2  The  effect  of  temporal  correlations 

We  now  turn  to  the  comparison  (11.4)  for  the  steady  state  inter-reference  times  when 
the  request  streams  R 1  and  R2  are  comparable  in  the  TC  ordering. 

Theorem  11.3  Assume  that  for  each  k  =  1,2,  the  request  stream  Rk  is  asymptotically 
stationary ,  i.e.,  {Rk+e,t  =  0,1,...}  {Rt,t  =  0,1,...}  where  ft  =  {Rk,t  = 
0, 1, . . .}  is  a  stationary  sequence  of  AT -valued  rvs,  and  has  admissible  popularity  pmf 
pk .  If  R 1  <tc  R  -  then  the  comparison  (1 1.4)  holds. 

Theorem  1 1.3  states  that  the  stronger  the  temporal  correlations,  the  more  variable  the 
inter-reference  time!  To  establish  Theorem  11.3,  we  shall  rely  on  the  following  lemma 
whose  proof  is  available  in  Appendix  E.4. 

Lemma  11.4  Assume  that  the  request  stream  R  =  {Rt,  t  —  0, 1, . . .}  is  asymptotically 
stationary,  i.e.,  {Rt+e,t  =  0,1,...}  {Rt,t  =  0,1,...}  where  R  =  {Rt,t  = 
0, 1, . . .}  is  a  stationary  sequence  of  Af -valued  rvs,  and  has  admissible  popularity  pmf 
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p.  Then,  it  holds  that 

oo  N 

E  P  iT(R )  >  r]  =  Ep  [Re  ¥=  i,t  =  0, . . .  ,n  -  l]  ,  n  —  1,2, ,  (11.7) 

r=n  i= 1 

and 

OO 

E  [T{R)}  =  E  p  [p(-R)  >  r]  =  N.  (11.8) 

T  — 0 


Proof  of  Theorem  11.3.  The  proof  of  this  theorem  proceeds  along  lines  similar  to  ones 
found  in  the  proof  of  Theorem  11.2.  The  comparison  (11.4)  is  established  by  showing 
that  (11.5)  and  (11.6)  hold  whenever  R}  <tc  Ft2- 

Fix  k  =  1,2.  For  each  i  =  l,...,N,let{Vtk(i),t  =  0,1,...}  and  {Vtk(i),t  = 
0, 1, . . .}  be  the  indicator  sequences  (9.1)  associated  with  R  and  R  ,  respectively.  From 
Lemma  1 1.4,  the  expression  (1 1 .7)  for  each  n  —  1,2,...,  can  be  rewritten  as 

OO  N 

E  p  ^T(Rk)  >  r]  =  EE  l1  [Re  ^  i,£  —  0,  ■  ■  ■  ,n  —  lj 

r=n  i= 1 

N  ~n — 1 

=  EE  II(i-9‘W) 

i= 1  L£=0 

N 

=  EE[V’(v?W.....iT1(i))]  (n.9) 

i= 1 

where  the  mapping  ijj  :  1R"  — >  1R  is  of  the  form  (10.8)  and  (10.10).  By  Lemma  10.5,  the 
mapping  i/j  is  supermodular. 

For  each  k  =  1,2,  the  assumption  {R^+i,t  =  0, 1, . . .}  {R^,t  =  0, 1, . . .} 

yields 

(EEE  =  0, 1,  • .  •}  =**  {Vtk(i),t  =  0, 1, . . .},  i  =  1, . . . ,  N.  (11.10) 

But  R1  <Tc  R2  implies  the  comparison  {l}1  (i),t  =  0,1,...}  <sm  {V?{i),t  = 
0, 1, . . .}  for  each  i  =  1, . . . ,  N,  and  the  sm  comparison  being  closed  under  weak  con- 
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vergence  [52,  Thm.  3.9.8,  p.  116],  it  is  now  plain  from  (11.10)  that 


{Vt  (*)»  *  =  0, 1,  •  •  •}  <sm  {Vt2(i),t  =  0, 1, . . .},  i  =  1, . . . ,  N. 


(11.11) 


-i 


~  2 


In  short,  R  <tc  R  and  the  required  condition  (1 1.5)  follows  upon  combining  (11.11) 
with  (11.9). 

Lastly,  under  the  assumptions  of  the  theorem,  we  recall  from  Lemma  11.4  that 


E 


T(R 


=  E 


T(R2)  =  N,  and  (11.6)  holds. 


The  following  results  are  obtained  upon  combining  Theorem  11.3  with  Theorems 
9.3,  9.4  and  9.7,  respectively. 

Corollary  11.5  Assume  the  request  stream  R  =  {Rt,t  =  0, 1, . . .}  to  be  modeled 
according  to  the  stationary  HOMM(h,  ct,p)  with  admissible  popularity  pmf  p.  Then,  it 
holds  that 

T(R)  <cx  T(R) 

where  R  is  the  IRM  with  popularity  pmf  p. 

Corollary  11.6  Assume  that  for  each  k  =  1,  2,  the  request  stream  R3k  =  { R,k ,  t  = 
0,1,...}  is  modeled  according  to  the  stationary  PMM(/3k ,  p)  with  admissible  popularity 
pmf  p.  IfO  <  fo  <  Pi,  then  it  holds  that 

T(Rdl)  <cx  T(R 92 ). 

Corollary  11.7  Assume  the  request  stream  Ra  =  {/?“,  t  —  0, 1, . . .}  to  be  modeled  ac¬ 
cording  to  the  stationary  LRUSM(a)  with  stack  distance  pmf  a  satisfying  (9.22).  Then, 
it  holds  that 

T(Ra)  <cx  T(Ra ) 
where  R°  is  the  IRM  with  uniform  popularity  pmf  u. 
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11.2  Stack  distance 


The  notion  of  stack  distance  has  been  widely  used  as  a  metric  for  temporal  correlations 
[1,  3,  50]:  For  each  t  —  1,2,...,  the  stack  distance  of  the  request  stream  R  =  {Rt,  t  = 
0, 1, . . .}  at  time  t  is  the  rv  D(t;  R )  defined  by 

D(t;  R)  =  \{Rt~T(t-,R)+i ,  ,■■■ ,  Rt} |  (11-12) 

where  T(t;R)  is  the  inter-reference  time  (11.1).  It  is  not  hard  to  see  that  the  relation 

D(t;  R)  =  S(t,  T{t\  R);  R)  (11.13) 

holds.  In  words,  D(t;  R)  can  be  interpreted  as  the  working  set  size  where  the  length  of 
the  working  set  is  taken  to  be  the  inter-reference  time  T(t;  R).  Hence,  D(t;  R)  records 
the  number  of  distinct  documents  requested  from  the  time  the  document  Rt  was  last 
requested  before  time  t. 

Under  some  appropriate  conditions  on  the  request  stream  {Rt,t  =  0, 1, . . .},  the 
weak  convergence  D(t ;  R)  =>t  D(R)  holds  with  the  steady  state  stack  distance  I)(R.) 
being  the  rv  representing  the  number  of  distinct  documents  requested  between  two  con¬ 
secutive  requests  for  the  same  document.  This  fact  is  given  in  the  next  lemma  whose 
proof  can  be  found  in  Appendix  E.5. 

Lemma  11.8  Assume  the  request  stream  R  =  {Rt,t  =  0, 1, . . .}  to  be  asymptoti¬ 
cally  stationary,  i.e.,  {Rt+e,  t  =  0, 1, . . .}  {Rt,  t  =  0, 1, . . .}  with  R  =  {Rt,  t  = 
0, 1, . . .}  being  a  stationary  sequence  of  M -valued  rvs.  Then,  it  holds  that 

D(t;R)=>tD(R).  (11.14) 
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It  is  known  [33,  37]  that  the  stack  distance  is  related  to  the  miss  rate  of  the  LRU 
replacement  policy.  Specifically,  given  a  request  stream  R  such  that  the  steady  state 
stack  distance  D(R )  exists,  the  miss  rate  Mlru(_R)  of  LRU  with  cache  size  M  can  be 
expressed  in  terms  of  the  tail  distribution  of  D(R)  through 

Mlrv(R)  =  P  [D(R)  >  M } .  (11.15) 

11.2.1  The  effect  of  popularity 

To  see  the  effect  of  popularity,  we  restrict  the  request  streams  to  be  in  the  class  of  IRMs, 
in  which  case  the  steady  state  stack  distances  exist  by  Lemma  11.8.  From  (11.13),  in 
view  of  the  results  obtained  in  Corollary  10.3,  we  might  expect  that  for  two  IRM  request 
streams  Rl  and  R1  with  popularity  pmfs  p1  and  p2,  respectively,  the  comparison 

D(R2)  <st  D(R2)  (11.16) 

should  hold  if  p1  -<  p2.  However,  the  comparison  (11.16)  can  not  be  established  as  we 
explain  below:  Recall  the  relation  (11.15)  between  the  miss  rate  of  the  LRU  policy  and 
the  tail  distribution  of  the  stack  distance.  In  Section  8.1,  we  have  seen  that  it  is  possible 
to  find  pmfs  p1  and  p2  on  J\f  such  that  p1  -<  p2  and  yet  MLRu(p1)  <  MLRU(p2),  or 
equivalently,  P  ^(R1)  >  M  <  P  \D(R2)  >  m].  As  we  recall  (3.2),  we  conclude 
that  the  comparison  (11.16)  does  not  hold  in  general. 

Although  somewhat  annoying  from  the  point  of  view  of  intuition,  this  state  of  affairs 
is  perhaps  not  too  surprising  (in  view  of  (11.13))  given  the  opposite  direction  of  the 
comparison  of  inter-reference  times  in  Theorem  1 1.2.  It  is  possible  that  some  compari¬ 
son  other  than  (11.16)  might  hold,  say  in  the  increasing  concave  ordering,  i.e.,  for  two 
IRM  request  streams  R1  and  R2  with  popularity  pmfs  p1  and  p2,  respectively,  it  holds 

D{R2)  <icv  D(Rl)  (11.17) 
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whenever  p 1  -<  p2.  This  comparison  is  compatible  with  the  weaker  result  of  Yue  and 
Wong  [73]  that  the  comparison  E  D(R2)  <  E  D^R1)  holds  whenever  p1  -<  p2. 

11.2.2  The  effect  of  temporal  correlations 

Inspired  by  the  results  obtained  for  the  working  set  size  in  Corollary  10.6,  we  would 
expect  that  the  stronger  the  strength  of  temporal  correlations,  the  smaller  the  stack  dis¬ 
tance.  Unfortunately,  we  have  not  yet  been  able  to  formalize  this  statement  and  will 
pose  this  problem  in  the  following  conjecture. 

Conjecture  11.9  Assume  that  for  each  k  =  1,2,  the  request  stream  Rk  is  asymptoti¬ 
cally  stationary,  i.e.,  {Rt+£,  t  —  0, 1, . . .}  =>t  {Rf,  t  —  0, 1, . . .}  where  Rh  =  {R^,  t  = 
0, 1, . . .}  is  a  stationary  sequence  of  Af -valued  rvs.  If  R1  <TC  R2,  then  it  holds  that 

E  [C>(i?2)]  <  E  [D(R[)  . 

A  support  for  this  conjecture  is  given  under  the  class  of  PMM  request  streams:  For 
this  class  of  request  streams,  we  have  from  Theorem  9.8  that  if  Rp'  and  Rp2  are  modeled 
according  to  the  PMM(/91,p)  and  PMIVK/C,  p),  respectively,  with  0  <  /32  <  Pi  (i.e., 
Rp1  <TC  R02),  then  MUnPRh)  <  Mlrv(RPi)  for  all  cache  sizes  M  =  1, . . . ,  N  -  1. 
It  then  follows  from  the  relation  (11.15)  that  P  D^R132)  >  M  <  P  1)(R. '' )  >  M 
for  each  M  =  1,  2, . . . ,  N  —  1,  or  equivalently,  that 

D(R^2)  <st  D(R^)  (11.18) 

by  the  property  (3.2)  of  the  usual  stochastic  ordering.  Conjecture  11.9  holds  under  the 
class  of  PMM  request  streams  since  (11.18)  implies  E  D(R f2)  <E  D(R/Sl)  . 
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Appendix  A 


A  Discussion  of  Lemmas  7.1  and  7.2 


Consider  the  RORA(r)  policy  for  some  eviction/insertion  pmf  r.  As  pointed  out  in 
Sections  7.1.1  and  7.1.2,  under  the  IRM  input,  the  cache  states  {Qt,  t  =  0, 1, . . . ,  }  form 
a  Markov  chain  with  state  space  A(M;Af)  whose  ergodic  properties  are  determined 
through  the  set  £r. 

Fix  the  cache  state  s  =  (i\, . . . ,  iM)  in  A(M;  A/”),  and  for  each  k,£  =  1, . . . ,  M, 
define  the  set  T k,g{s)  as  the  collection  of  states  which  can  reach  s  in  one  step  when  the 
eviction  and  insertion  are  occurring  at  positions  k  and  £,  respectively.  Thus, 


w  = 

(h,.. 

•  i  ik—ij  f  4)  •  •  ■ 

,  ie-i,  ie+i,  ■  ■ 

• ,  *m)  : 

i  £  s} 

if  k  <  £ 

w  = 

(a,-- 

• ,  ie-i,ig+i,  ■  ■  ■ 

j  ^ki  1k+ 1)  •  • 

• ,  *m)  : 

i  £  s} 

if  k  >  £ 

{s'  = 

(a,-- 

■  ,ig~i,i,ie+i,  ■ 

•  • ,  ijir)  :  i  ^ 

S } 

if  k  —  £. 

Lemma  A.l  Fix  t  =  0, 1, _ For  each  cache  state  s  =  (ii, . . . ,  iM)  in  A we 

have 


P  [flj+i  —  s 


s 


M  M 
i^Ls  k= 1  i—  1 


E  pP< 

,s/£=r  ki(s) 


(A.l) 
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Proof.  Fix  t  =  0, 1, . . ..  Obviously,  we  have 


p  [fit+1  =  s]  =  P[nt+1  =  s,Rtest}  +  p[nt+1  =  s,Rt^st] 

=  P  [a  =  5,  Rt  G  S'*]  +  P  [Qt+1  =  s,Rt<?  St]  (A. 2) 


because  the  cache  state  remains  unchanged  if  the  requested  document  is  in  cache. 
Next,  by  independence, 


P  [nt  =  s,RteSt\ 


N 

£p  [nt  =  s,Rt  =  i,iest\ 

i— 1 


(A. 3) 


since  St  is  determined  by  Q,.  Similarly, 


N 


p  [nt+1  =  s,Rt^st\  =  Y,pint+i  =  s,Rt  =  i,i?st 


i= 1 

=  £  P  [Qt+1  =  s,Rt  =  i] 

iqLs 

M  M 

=  EEE  £  p [n,  =  s',n,+1  =  s,js,  =  i] 

i^s  k= 1 1= i  s'er^(s) 

M  M 

=  Y.Y2Y1  pay™?  =  s'] 

i^s  k= 1  £=1  s'sr*.£(s) 

MM  ( 

=  &C0  £  £  £  p  [fit  = «'] 

\s'erk£(s) 


(A.4) 


k=  1  £=1 

We  obtain  (A.l)  by  collecting  (A. 3)  and  (A.4)  into  (A. 2). 


Case  1  -  The  set  £r  being  empty,  the  Markov  chain  has  exactly  one  irreducible 
component,  namely  A(r,  sq)  =  A (M;  J\f)  regardless  of  the  initial  condition  s0,  with 

1  t 

Hr(s]p )  =  lim  -  £  1  [flr  =  s]  =  lim  P  [fit  =  s]  a.s. 

t — >oo  f  t — >oo 
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for  each  s  in  A Letting  t  go  to  infinity  in  (A.l),  we  conclude  by  the  standard 
theory  of  Markov  chains  that  {pr(s;p),  s  G  A(M;  A/”)}  given  in  (7.3)-(7.4)  of  Lemma 
7. 1  is  indeed  the  stationary  pmf  of  this  Markov  chain  since  it  satisfies  the  Global  Balance 
Equations 


f*r(s;p)  =  Ms;p)  •  (a.5) 

\i£s  J  igs  k=ll=l  \s'erM(s)  / 


We  now  discuss  the  technical  issues  which  arise  when  N  —  M  +  1.  In  this  case,  the 


analysis  that  we  have  done  so  far  holds  for  all  RORA(r)  policies  in  Case  1  but  the  FIFO 
policy  with  either  r \M  =  1  or  rM\  =  1.  Under  this  particular  case,  if  s0  =  (*i,  •  •  • ,  *m)> 
then  only  M  +  1  states  can  be  reached  from  s0>  i.e.,  A(r,  sQ)  contains  the  elements 
(ii, . . . ,  ijif),  (*2,  •  •  • ,  *m,  *m+i),  (*3,  •  •  •  ,  *at+i,  *i)>  •  •  • ,  i}M+ 1,  H,  ■  ■  ■ ,  *m— i)-  This  state 
space  A(r,  s0)  is  equivalent  to  the  set  A*(M;  A/”)  and  it  can  be  verified  using  the  Global 


Balance  Equations  (A.5)  that  the  stationary  pmf  is  given  by 


Pr{s-,p)  = 


_ p(ii)---p(iM) _ 

EOi,...yM}eA*(M;A0P(7l)  •  •  • pUm ) 


(A. 6) 


with  .s  =  arbitrary  in  A(r,  s0).  Finally,  with  the  stationary  pmf  (A. 6)  and 

N  —  M  +  1,  it  is  plain  that  the  miss  rate  Mr(p)  and  the  output  popularity  pmf  p*  in  this 


case  are  still  given  by  (7.17)  and  (6.8),  respectively,  independently  of  the  initial  cache 


state  s0- 


Case  2  -  The  set  Er  is  non-empty  with  |£r|  =  m  for  some  m  —  1, . . . ,  M  —  1.  As 
discussed  in  Section  7.1.2,  if  the  Markov  chain  starts  in  the  initial  state  s0  in  A (M;  J\f), 
it  will  always  stay  within  the  component  A(r,  s0)  defined  at  (7.7).  On  this  component 
A (r,  so),  the  Markov  chain  is  irreducible  and  aperiodic;  its  stationary  pmf  exists  for  each 
s  in  A (r,  s0).  It  is  a  simple  matter  to  check  that  the  pmf  {/ir,s0(s)>  s  e  A(r,  s0)}  given 
in  (7.9)-(7.10)  of  Femma  7.2  satisfies  the  Global  Balance  Equations  (A.5)  and  hence  it 


is  a  stationary  pmf  for  this  Markov  chain. 
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In  the  case  when  N  =  M  +  1,  the  analysis  still  holds  for  all  RORA(r)  policies  in 
Case  2  with  the  exception  of  FIFO-like  policies,  i.e.,  the  RORA(r)  policy  with  ru  =  1 
for  some  k,  i  —  1, . . . ,  M  and  |£r|  =  m,  for  some  m  =  1, . . . ,  M  —  1.  For  this  special 
case,  under  the  same  reasons  as  in  Case  1,  the  state  space  A(r,  s0)  has  only  M  —  m  +  1 
elements  and  coincides  with  the  set  A *(r,  s o)  defined  at  (7.22).  We  again  use  the  Global 
Balance  Equations  (A. 5)  to  show  that  the  stationary  pmf  is  given  by 


f^r.so  (s,  p) 


_ Hi(^r{s0)P(h) _ 


(A.7) 


where  s  =  (ii, . . . ,  iM)  arbitrary  in  A (r,  s0).  It  is  easy  to  check  in  this  case  that  with 
the  stationary  pmf  given  in  (A.7),  the  miss  rate  Mr(p;  so)  and  the  output  popularity  pmf 
p*r  so  also  admit  the  expressions  (7.26)  and  (7.32),  respectively. 
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Appendix  B 


Proofs  of  Theorems  8.1,  8.6,  8.8,  8.12  and  10.11 


Throughout,  the  notion  of  asymptotic  equivalence  is  defined  as  follows:  For  mappings 
/,  g  :  1R+  — ►  H,  we  write  f(a)  ~  g(a)  ( a  — >  oo)  if  lim^oo  =  1.  We  shall  have 
repeated  use  for  the  next  two  elementary  lemmas. 


Lemma  B.l  Consider  a  finite  family  ai, ,  a  k  of  positive  scalars.  We  have 


k=  1 


—  Ot 

ak  ~  c  • 


mm  ak 

k=l,...,K 


(a  — >  oc) 


where  c  denotes  the  number  of  indices  i  for  which  it  holds  ap  =  minfc=li  _K  ak- 


Lemma  B.2  Consider  2 K  mappings  f\ fK.  gK  :  ]R+  — >•  R+  such  that  for  each 
k  =  1, . . . ,  K,  we  have  fk(a )  ~  gk{oi)  as  a  — >  cxd.  Then,  it  holds  that 

I<  K 


53  ~  9k(a)  (a  ->  ex:). 


From  now  on,  without  further  mention,  all  asymptotics  are  understood  in  the  regime 
where  a  is  large,  and  the  qualifier  a  — >  oo  is  dropped  from  the  notation.  In  particular, 
by  recalling  the  normalizing  constant  Ca(N)  of  Zipf-like  distributions  defined  at  (6.5), 
we  note  that 

Ca(N)  ~  1.  (B.l) 
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B.l  A  proof  of  Theorem  8.1 


Fix  O'  ^  0.  Upon  substituting  (6.4)~(6.5)  into  the  expression  (8.4),  we  find 


Mhmj(pa)  = 


Ca(NY  ^ 


Yi  “""(i) 


(B.2) 


TTm  ?~a 

v  d)  =  V  _ ih=ih _  ■  i  N 

a  ■i— ^  ttM— 1  / »—  a)  ’  ’  ’  ’ 


(B.3) 


where  for  each  element  s  =  (ii, . . . ,  im)  of  Aj(M;  A/”),  we  have  denoted  by  j  ^ 
ifc}  the  set  of  elements  j  in  J\f  which  are  not  in  the  set  {il5 . . . ,  ik}. 

Fix  i  =  1,2, ...  ,N.  For  each  element  s  =  (i\, . . .  ,im)  in  A *(M ;  A/”),  we  invoke 
Lemma  B.l  to  claim  that 


Y  j  “  ~  . rf/min.  J)  ,  k  —  1, . . .  ,M  —  1, 

■  X  \J0Fl,  — ,»fc}  / 


whence 


where  we  have  set 


n  z  j- 

k=1 


M—l  / 

p(s)  :=  n  wfmin .  j 


Lemmas  B.l  and  B.2  together  yield 


”a(i)  ~  I] 

seA  i(M;AT) 


p(s) 


c  U  •  v  U 


where 


n  .  /n^ 

im)  :=  mm  — - 
seAi(M;Ar)  y  p(s) 


(B.4) 


(B.5) 


and  c(i)  is  the  number  of  elements  s  in  A,:(M ;  A/”)  which  achieve  the  minimum  in  (B.5). 


To  proceed  we  note  the  obvious  inequality 


u(i)  > 


mmseAi(M-AT) 


(life,  it) 


maxseAj(M^V)  p(s) 


(B.6) 
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We  shall  show  the  existence  of  element(s)  s  in  A,  (M:  J\f )  which  simultaneously  achieve 


the  minimum  in 


and  the  maximum  in 


min 

s&A 


(B.7) 


max  p(s). 

seAi(M;AT) 


(B.8) 


This  will  imply  that  (B.6)  holds  as  an  equality,  and  in  the  process  both  the  minimal  value 
of  v{i)  and  the  integer  c(i)  will  be  determined. 

For  i  =  M  +  1, . . . ,  N,  it  is  plain  that  s  =  (1, . . . ,  M)  is  the  only  element  in 
A i(M;  J\f )  achieving  both  the  minimum  (B.7)  with  minimal  value  Ml  and  the  maximum 
(B.8)  with  maximal  value  Ml.  This  last  claim  can  be  established  by  easy  interchange 
arguments.  Thus,  c(i)  =  1  and 


u(i)  = 


Ml 

Wl 


1. 


(B.9) 


Similarly,  when  i  =  2, . . . ,  M,  the  element  s  —  (1, . . . ,  i  —  1,  i  +  1, . . . ,  M,  M  +  1) 
of  A i(M ;  A f)  yields  the  minimum  (B.7)  with  minimal  value  DQ=i  £  •  t  and  the 

maximum  (B.8)  with  maximal  value  11^=2  ^  '  iM~l+1 ,  whence  c(i)  =  1  and 


ro-iMi&ji?  (M+i)! 
I\ZU-iM~i+1  iliM~i+ 1 


(B.10) 


For  i  —  1,  p(s)  =  1  for  any  element  s  in  Ax  (M;  J\f)  so  that  the  maximum  (B.8)  has  value 
1.  On  the  other  hand,  the  minimum  (B.7)  is  achieved  by  any  of  the  Ml  permutations  of 
(2,  3, ... ,  M,  M  +  1),  yielding  the  minimal  value  (M  +  1)!.  Hence,  c(l)  =  Ml  and 


i/(l)  =  (M  +  1)! 


(B.ll) 


which  is  simply  (B.10)  at  i  =  1. 

Invoking  Lemmas  B.l  and  B.2  again,  we  find 


N 

^2raua(i)  ~  c 

i=  1 


nnn  iv \i 
i=  1 . N  ' 


(B.12) 
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for  some  integer  c  to  be  determined.  It  follows  from  (B.9)  that 


min  iu(i)  =  min  i  —  M  + 1  (B.13) 

i=M+l,...,N  v  i=M+l,...,N 


and  (B.IO)  allows  us  to  write 


min  =  (M  +  1)  min  </?(().  (B.14) 


with 

i  =  (B.  15) 

It  is  a  simple  matter  to  check  that 

M\  =  (p(  1)  >  <p( 2)  >  . . .  >  (p(M)  =  1  (B.16) 

so  that  the  minimum  in  (B.14)  is  achieved  at  i  —  M  with  minimal  value  M  +  1.  It  then 
follows  from  this  fact  and  (B.13)  that 

min  iu(i)  —  M  +  1  (B.  17) 

i=l,...,N  W 

and  c  =  2.  Finally,  combining  (B.l)  (B.2),  (B.12)  and  (B.17)  readily  leads  to 

MlruCpJ  ~2(M+l)"a  (B.l 8) 

and  the  desired  conclusion  (8.7)  is  obtained.  ■ 


B.2  A  proof  of  Theorem  8.6 

First,  in  order  to  lighten  up  the  notation,  let  p*  denote  pf  RU  a.  The  proof  of  Theorem 
8.6  relies  on  the  following  observation:  By  the  definition  of  majorization  (2.1)-(2.2),  the 
comparison  p*  -<  pa  requires  the  condition 

min  pa{%)  <  min  p*(i)  (B.l 9) 

i=l,...,N  i=l,...,  N 
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to  hold.  Thus,  as  we  recall  (6.6),  this  comparison  will  not  hold  if  we  can  show  that 


Ca(N)Na 


min  p*(i)  <  1. 

*=!,..  .,N  aX  ' 


(B.20) 


We  show  under  the  appropriate  conditions  on  M  and  N  that  (B.20)  indeed  holds  for 
large  enough  values  of  a. 

Fix  a  >  0  and  substitute  (6.4)-(6.5)  into  the  expression  (8.11)  for  the  pmf  p*.  For 
each  i  —  1, . . . ,  N,  we  find 


P*a  I 


l  = 


E",  j-’-vM) 


(B.21) 


with  va(i),  i  —  1, . . . ,  N,  given  at  (B.3).  By  virtue  of  (B.4),  (B.12),  (B .17)  and  (B.21), 
we  can  now  write 


c(i)  ( M  +  l\  .  i 


Consequently, 


'  M  +  l' 


mm  pn\i)  ~  -  mm  \  c(i)  - — 

*=1 JV  ’  2  <=1  v  w  v  iu(i) 


(B.22) 


By  recalling  (B.9),  we  get 


mm  c  (i) 


M  +  l 


iv{i) 


M+l 

N 


(B.23) 


where  the  minimum  is  achieved  at  i  —  N.  Next,  by  using  (B.10),  we  get  with  the  help 
of  (B.  15)  and  (B.16)  that 

/  2M_1\  a 


.  M+l 

nnn  c(i)  .  ... 

i=2,...,N  l  l  IV  (l) 


M\ 


(B.24) 


where  the  minimum  is  achieved  at  i  =  2.  Finally,  v(l)  =  (M  + 1)!  and  c(l)  =  M\  yield 

(B.25) 


c(l)[^M+=W.  1 


"(i) 


(M!)“‘ 
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Combining  (B.l),  (B.23),  (B.24)  and  (B.25),  we  conclude  from  (B.22)  that 

1  (  /  N  \a  ( 2M~1N\a 

°a(N)N*  •  p*a(i)  ~  -  mm  ( M!  (M)  + 


Under  (8.26),  as  a  grows  large,  the  first  term  in  the  minimum  above  will  have  the  small¬ 
est  value,  so 


M\  /N\l 

c-(#r)r '  YnN  p«(i)  ~  —  (m) 


and  the  condition  (B.20)  indeed  holds  for  large  enough  values  of  a. 


B.3  A  proof  of  Theorem  8.8 


Fix  q  ^  0.  By  substituting  (6.4)-(6.5)  into  the  expression  (8.30),  we  find 

1  N 

Mcl (pa)  =  r  (  - 

Ga(iV  )ACL,a  i=  i 


(B.26) 


with 


and 


M 


VaY)  ~  1  Vht 

s£Ai(M;JV)  £=1 


e  n*r(,M+1).  » = i, . , . ,  jv. 


M 


K CL, a  =  E[ 


■  1) 


sGA(M;A0  £=1 

Fix  i  —  1, . . . ,  N.  By  Lemma  B.l  we  immediately  get 


(B.27) 


(B.28) 


Va(i)  ~ 


(B.29) 


with 


M 


77(f)  :=  min  TT  if1  1+1 


(B.30) 


and  d(i)  is  the  number  of  elements  s  in  A j(M;  J\f)  that  achieve  the  minimum  in  (B.30). 
Elementary  interchange  arguments  show  that  the  minimal  value  in  (B.30)  is  achieved  at 
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some  unique  element  s  —  (i\, . . . ,  im)  of  A,:(M ;  A/”)  with  the  property  ii  <  i2  <  ...  < 
%m,  so  that  d(i)  =  1. 

Using  this  observation,  we  first  conclude  that 

M 

77 (M  +  1)  =  . . .  =  rj(N)  =  []  £m-*+\  (B  .31) 

f=\ 

On  the  other  hand,  whenever  i  =  1, ... ,  M,  direct  inspection  shows  that 


ifW  =  (m+ 1)  n 

1  <i<i 

n  i<i<M " 


nM-l+l 


n 

i<l<M 


ffM-e+2 


j^M-i+1 


(M  +  1  )ri(M  +  1) 


=  (M  +  lWM  +  l)^ 

i 


where  the  quantities  i  =  1, . . . ,  M,  are  defined  at  (B.  15). 
Next,  upon  making  use  of  Lemmas  B.l  and  B.2,  we  see  that 

N 


ar]a(i)  ~  d  ■  fmin^ 

i= 1  '  ' 


(B.32) 


(B.33) 


with  d  denoting  the  number  of  indices  achieving  the  minimum  in  inin^i  v  ir](i). 
Obviously,  by  virtue  of  (B.31),  we  find 


min  in(i)  =  (M  +  1  )ri(M  +  1)  (B.34) 

i=M+l,...,N  w  v  '  v  ' 

where  the  minimum  is  achieved  at  %  —  M  +  1.  On  the  other  hand,  as  we  rely  on  (B.32), 
we  get 


min  irj(i)  —  (M  +  l)r](M  +  1)  min  ip(i)  (B.35) 

and  by  (B .16),  the  minimum  in  (B.35)  is  achieved  at  i  —  M  with  minimal  value  (M  + 
1)?7(M  +  1).  Combining  this  fact  with  (B.34),  we  obtain  d  —  2  and 

min  =  (M  +  l)rj(M  +  1).  (B.36) 
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Lastly,  invoking  Lemma  B.l  with  (B.28)  leads  to 


Kcu 


mm 


M 

n 


■M-e+i 


^SA(M;AT)  J 

/  M  \~a 

j j£M~e+1)  =v(m  +  i  y 


(B.37) 


e= 1 


It  is  now  plain  to  see  from  (B.l),  (B.26),  (B.33),  (B.36)  and  (B.37)  that 


Mcl(Po)  ~  2 (M  +  1)' 


(B.38) 


and  the  conclusion  (8.32)  follows. 


B.4  A  proof  of  Theorem  8.12 


To  simplify  the  notation,  we  shall  write  p*  to  denote  p)jr  a.  The  proof  of  this  theorem 
proceeds  along  the  same  line  as  the  proof  of  Theorem  8.6.  We  need  to  show  under  the 
appropriate  conditions  on  M  and  N  that  (B.20)  holds  for  large  enough  values  of  a. 

Fix  a  >  0.  Substitute  (6.4)-(6.5)  into  the  expression  (8.34)  yields 


Piw  = 


i  a  Vo 


E  Urava(j) 


i  =  1, 


,N, 


(B.39) 


with  rja(i),  i  =  1 ,N,  given  at  (B.27).  With  the  help  of  (B.29),  (B.33),  (B.36)  and 
(B.39),  we  can  now  write 


Pa(*) 


'  (M  +  1  )r]{M  +  1)' 


i  =  1, . . . ,  N. 


(B  .40) 


Therefore,  we  obtain 


min  p*(i) 


1  /  (M  +  l)ri(M  +  l)\a 

2  \  max,-  i . v  ir)(i)  J 


(B.41) 
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Upon  noting  (B.31),  it  is  a  simple  matter  to  check  that 


max  in(i)  =  N  ■  ri(M  +  1) 

i=M+l,...,N  w  v 


and  from  (B.32),  it  follows  from  the  fact  (B.16)  that 


max^  i?7(i)  =  (M  +  1)!  •  r]{M  +  1). 


As  a  result  of  (B.42)  and  (B.43),  we  find 


(B  .42) 


(B.43) 


max  =  max((M  +  1) ! , iV)  •  rj(M  +  1).  (B.44) 

i=l,...,N 

To  conclude  the  proof,  we  note  from  (B.l),  (B.41)  and  (B.44)  that 

CJN)Na  ■  min  p*  (i)  ~  1( _ —  +  \ 

a[  ’  PaV  ’  2  ymax  ((M  +  1)!,  N) ) 

with  max  ((M  +  1)!,  N)  =  (M  +  1)!  under  (8.26).  Consequently,  the  last  asymptotics 

takes  the  simplified  form 

1  /  N 


ca{N)N«  •  mmN  p*a{i)  ~  -(M) 


and  the  validity  of  (B.20)  for  large  enough  values  of  a  follows. 


B.5  A  proof  of  Theorem  10.11 


To  simplify  the  notation,  the  output  pmf  p^- s  Q  will  be  denoted  by  p*.  As  in  the  proof 
of  Theorem  8.6,  we  try  to  establish  (B.20)  under  the  appropriate  condition  on  r  and  N 
for  large  enough  value  of  a. 

Fix  a  >  0  and  r  >  1.  By  substituting  (6.4)-(6.5)  into  the  expression  (10.18),  we 
have 


Pa(V 


rQ(E^rT 
zLi  k~a(  z^~a) 


i  =  1, 


,7V, 


(B.45) 
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where  we  have  denoted  by  j  ^  i  the  set  of  elements  j  in  J\f  which  are  different  from  i. 
As  a  direct  application  of  Lemma  B.l,  it  follows  that 


ra(Y ra)T  ~  r“(min j)~aT  = 


and  therefore  by  Lemma  B.2,  we  find 

N 

i=  1 


2~aT,  i  =  1 


i  =  2, .  .  .  ,  N 


N 


2~ar  +  ^  i 


t=2 


Combining  (B.45),  (B.46)  and  (B.47)  yields 


(B  .46) 


(B.47) 


PaW 


2~a{r-l)  ^  i  =  x 

GP  i  —  2, . . . ,  N. 

From  the  expressions  (B.48),  it  is  a  simple  matter  to  check  that 


(B.48) 


min  p*(i)  ~  min(2  a^T  mm  I  - 

i=l,...,Ap"W  V  i=2,...,N  \2 

/  /  J\T\  -<*> 

=  min  I  2-Q(T-1),f  — 


(B  .49) 


Finally,  we  note  from  (B.l)  and  (B.49)  that 


Ca(N)Na  •  .  min^  p*(i)  ~  min  ^(^r) 


and  by  the  enforced  condition  (10.19),  this  asymptotics  reduces  to 


Ca(N)N°  •  min  p*a(i) 
1=1 


N 

>T—  1 


Hence,  the  condition  (B.20)  is  satisfied  for  large  enough  values  of  a. 
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Appendix  C 


Proofs  of  Theorems  8.5  and  8.11 


C.l  A  proof  of  Theorem  8.5 

To  lighten  up  the  notation,  we  shah  write  pi  to  denote  pf  RU  £.  From  Proposition  8.4,  the 
comparison  p*  -<  p:  does  not  hold  whenever  5(e)  >  -4=^,  or  equivalently,  whenever 


P*A  1)  <  e. 


(C.l) 


Under  the  pmf  (8.16),  we  hnd  from  (8.10)  that 

...  .  (N-  2)1  (l  -  (N  -  l)e)eM 

p£(l)m[l]pe)  =  — - — - —  -  T  .  . - — - a[l } 


(N  —  M  —  1)!  nfci'a  -ke) 


with 


(C.2) 


N  —  1 


if  i  —  1 


am  = 


i  i  (N—M—l)e  ,  y^M-1  ttAT-1  ( 1 — fee)  -r  •  r>  at 

1  "I"  (i_(jv-l)£)  +  1^1=1  11  k=t  (N-k)e  11  4  “  Z’ 

Reporting  (C.2)-(C.3)  into  (5.4),  we  get 


P*( 1)  = 


< 


< 


'  |  (N  —  M  —  l)e  (1  -ke) 

(1  —  (N  —  l)e)  ^  (iV  -  fc)ej 

(l-fce) 

L  it  Ji 


n  -1 


-1 


(1-fe) 

(N-e)e 


-i 


(C.3) 


(C.4) 
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where  the  last  inequality  follows  from  the  fact  that  for  each  k  —  1, . . . ,  M  —  1,  > 

1  since  e  <  . 

Consequently,  the  condition  (C.l)  will  hold  if 


M— 1 


i<  E 

i= i 


(i  -  fe) 

(iV-f) 


or  equivalently,  if 

( Y'M-i  i  A  _  i 
12^=1  x 

£  <  fyM-1 

V^=i  N-e) 

Hence,  provided  that  N  and  M  satisfy  the  condition  ]yU=l  '  yy  >  1>  there  exists  £  in 
the  range  (8.23)  for  which  the  comparison  p *  -<  pe  does  not  hold.  ■ 


C.2  A  proof  of  Theorem  8.11 


First,  to  simplify  the  notation,  the  output  popularity  pmf  p*CL  e  will  be  denoted  by  pi. 
The  proof  of  this  theorem  proceeds  along  the  same  lines  as  in  the  proof  of  Theorem  8.5. 
We  seek  e  such  that  the  condition  (C.l)  holds. 

For  the  input  pmf  (8.16),  we  have  from  (8.33)  that 


p£(i)m(i]pe)  = 


(N  —  2)!  (1  -  {N  -  l)e)£^ 


(N-M-  1)! 


K, 


CL 


■b(i) 


(C.5) 


with 


b{i)  = 


N  -  1 

N-M- 1  ,  ( l—(N—l)s\ 

)e  +  ^e=l  {  e  ) 


t-\ 


l-(TV-l) 

Combining  (C.5)-(C.6)  with  (5.4),  we  find 


if  %  —  1 


if  i  —  2, . . , ,  N. 


(C.6) 


Pe( !)  = 


N-M -l  ™  (l-  (N-l)e 

i  +  .  +s  — 4 — - 


t-i 


i  -i 


1  -  (N  —  l)e 


£=1 
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(C.7) 


(C.8) 


This  last  inequality  indeed  holds  when  e  is  in  the  range  (8.35)  and  the  desired  result 
follows. 
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Appendix  D 


Proofs  of  Proposition  9.6  and  Theorem  9.7 

D.l  A  proof  of  Proposition  9.6 

To  facilitate  the  proof,  we  shall  need  the  following  notion  of  stack  position:  Fix  i  = 

1..  ..,7V.  For  each  t  =  0, 1, . . let  the  rv  X“(i)  denote  the  position  of  document  i 

in  the  LRU  stack  Qt  at  time  t  associated  with  the  request  stream  {Rf,  t  =  0, 1, . . 
From  the  stack  operation  (9.16),  the  sequence  t  —  0, 1, . . .}  is  seen  to  evolve 

according  to  the  recursion 

1  if  Dt  =  X?(i) 

X?+1(i)  =  <  Xta(i)  if  A  <  Xf{%)  (D.l) 

xta(i)  + 1  if  A  >  x?(i) 

for  all  t  —  0, 1, . . .  with  the  initial  position  X*(i)  given  and  assumed  independent  of  the 

1.1. d.  stack  distances  (A,  t  —  0, 1, . . .}. 

By  independence  of  the  rvs  {Dt,t  =  0, 1, . . .},  it  follows  from  (D.l)  that  the  se¬ 
quence  {X?(i),t  =  0, 1, . . .}  is  a  Markov  chain  on  the  state  space  (1, . . . ,  iV}  with 
one-step  transition  probability  matrix  Pa  =  (A*,  j,  k  —  1, . . . ,  N)  given  by 

P°i  =  v[x^(i)=j\Xf(i)  =  k_ 
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=  S(j,l)P[Dt 


k]  +  8(j,  k) P  [Dt  <k}+  8(j,  k  +  1)P  [Dt  >  k] 


/ k-1 


N 


=  S(j,l)ak  +  8(j,k)  ■  i^2ae  \  +8(j,k  +  l)  ■  [  at  ] 

\£=1  /  \f=k+l  / 

for  j,  k  —  1, . . . ,  N,  where  we  set  8(x,  y)  —  1  [x  —  y]  for  any  x,  y  G  If  .  This  transition 
matrix  Pa  is  a  doubly  stochastic  matrix,  i.e.,  J2jLi  Pkj  —  YJk=i  Pkj  —  1  for  h  k  = 
1, . . . ,  N.  An  invariant  distribution  for  Pa  then  exists,  is  unique  and  is  given  by  the 
uniform  pmf  u  on  {1, . . . ,  N}. 

The  condition  >  0  is  necessary  and  sufficient  for  the  Markov  chain  {X"‘(i).  t  = 
0, 1, . . .}  to  be  irreducible  on  its  finite  state  space  {1, . . . ,  iV},  hence  to  be  positive  re¬ 
current.  For  0  <  aw  <  1,  the  Markov  chain  {X“(i),t  =  0, 1, . . .}  is  aperiodic  while 
for  oat  =  1,  it  is  periodic  with  period  N.  Regardless  of  its  periodicity  [36,  Thm.  6.4.3, 
p.  227],  when  aN  >  0,  the  fraction  of  time  that  t  =  0, 1, . . .}  spends  in  a  given 

state  k  will  a.s.  converge  to  the  corresponding  entry  of  invariant  distribution.  The  latter 
being  the  uniform  pmf  on  {1, . . . ,  N},  we  conclude  that 

lim  7  Z! 1  [A7C0  =  k]  =  -J-  a.s.,  k  =  1, . . . ,  N.  (D.2) 

t—^OO  t  iV 


Moreover,  in  the  stationary  regime,  when  a at  >  0,  we  have 


P  [*“(<)  =  k\ 


1 

N' 


k 


1 


for  all  7  =  1, . . . ,  N.  This  implies  that  in  stationarity,  the  stack  rvs  t  =  0, 1, . . .} 
are  uniformly  distributed  over  A 

With  the  fact  (D.2),  we  are  now  ready  to  prove  Proposition  9.6:  Fix  i  =  1 , ,N. 
Recall  that  /?"  =  i  if  and  only  if  X 'f(i)  =  1  since  this  corresponds  to  document  i  being 
in  position  1  of  the  LRU  stack  Qt  associated  with  the  request  stream  R°.  Under  the 
assumption  a at  >  0,  we  can  combine  this  observation  with  the  convergence  (D.2)  to  get 

Pa(i)  =  lim  7  Z  1  [Rr  =  *] 
t-°°  t  T=1 
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a.s. 


=  fejEMA?w  =  i] 

and  the  desired  result  is  obtained.  ■ 

D.2  A  proof  of  Theorem  9.7 

Throughout,  for  each  i  —  1, . . . ,  N,  we  set 

Vta(i)  =  1  [R?  =  i\ ,  t  =  0,1,...,  (D.3) 

and  for  each  t  —  0, 1, . . .,  write  Va,t(i )  =  . . . ,  Cfa(i)). 

Fix  i  —  1, . . . ,  N.  In  order  to  establish  the  CIS  property  of  the  sequence  {Vta(i),t  = 
0, 1, . . it  suffices  to  show  that  for  each  t  —  0, 1, . . .,  the  inequality 

P  \yta+l{i)  =  l|Va>*(i)  =  X*]  <  P  \vta+l{i)  =  l\Va,\i)  =  yl]  (D.4) 

holds  for  any  pair  of  vectors  x*  =  (x0,  ■ . . ,  xt)  and  yl  =  (y0, . . . ,  yt)  in  {0,  l}t+1  satis¬ 
fying  xl  <  y 1  componentwise. 

Our  first  task  is  to  provide  a  simpler  expression  for  the  probabilities  of  interest.  To 
that  end,  for  £  =  1, . . . ,  N,  we  introduce  the  quantities  (Pt(£),  t  —  0, 1, . . .}  given  by 

m)  ■■=  P  [x?+iW  =  WC0  =  e,  x?(i)  ?  1, . . . ,  X :“(*)  ^  l]  (d.5) 

for  all  t  =  1,2,...  with 

P„«)  :=P[X1“(i)  =  l|X0“(i)  =  fl. 

Moreover,  for  each  t  —  0, 1, . . .,  and  any  non-zero  element  x*  in  {0,  l}t+1,  we  set 

t^x1)  :=  max  (s  —  0, . . . ,  t  :  xs  —  1) . 
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Proposition  D.l  For  each  t  =  0, 1, . . and  any  non-zero  vector  xt  in  {0, 1}<+1,  it  holds 
that 

P  [%(*)  =  l|Va’*(i)  =  Xl\  =  Pt-r&)( !)•  (D.6) 

Proof.  Fix  t  —  0, 1, . . .  and  consider  a  non-zero  vector  xl  =  (x0, . . .  >  xt)  in  {0,  l}m. 
Writing  r  =  r(xt)  to  simplify  the  notation,  we  see  from  the  definitions  that 

[V^ii)  =  x *] 

=  v^-'d)  =  xT-\  VTa(i)  =  1,  VTa+1(*)  =  0,  •  •  • ,  Vta(i)  =  0 

=  Va'T~l{i)  =  xT~\  Rar  =  i,  RaT+1  ^i,...,R?^i 
=  va’r_1(i)  =  xr~1 ,  X?(i)  =  1,  X^S)  ±  1,  •  •  • ,  *t°(*)  7^  l]  (D.7) 

where  we  have  set  aF'1  =  (x0,  •  •  • ,  £t-i)  and  that 

[W+iO)  =  1]  =  Wi(<)  =  1].  (D.8) 

Assume  first  that  r  <  t.  Now  observe  that  the  event  \Va'T  l{i)  =  xT~1,  X?(i)  =  1] 
is  determined  by  the  rvs  X ,  X "'(/).  Thus,  by  preconditioning  with  respect  to 
these  rvs,  we  readily  conclude  from  (D.7)  that 

P  [va’4(i)  =  xt 

=  p  [va'T~1(i)  =  xT-\x?{i)  =  i ,xr“+1(i)  ^  1, . . . ,  x?{%)  ±  i 

=  P  [Va-r“1(i)  =  xT~\  X?(i)  =  1 

•P  [*“+1(i)  ^  1, . . . ,  Xta(i)  ^  1|X“(0  =  l]  (D.9) 

where  in  the  last  step  we  used  the  fact  that  the  stack  position  sequence  {X°(i),t  = 
0, 1, . . .}  is  a  Markov  chain.  Similarly,  this  time  making  use  of  (D.7)  and  (D.8),  we  get 

p  [v**(0  =  *^(0  =  i 
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=  P  [va'T~\i)  =  x*-\X?(i)  =  l,X“+1(i)  ^  1, . . .  ,xta(i)  ?  l,x?+1(i)  =  1 

=  P  =  xT-\X?(i)  =  1 

P  [X“+1(i)  ^  1  ±  l,*f+1(i)  =  1|X“(<)  =  l]  .  (D.IO) 

It  is  now  plain  that 

P[%(0  =  1|  va’\i)  =  x\ 

P[va'tti)  =  x\V“1(i)  =  l 
P  [Va^(i)  =  x* 

P  [X“+1(i)  ^  1, . .  ,,X?(i)  ?  l,X?+1(i)  =  l\X?(i)  =  1 
P  [X?+I(i)  ?  1, . . . , X?(i)  ?  1| x?(i)  =  1] 

p  [x»(0  =  1,  x?+1(i)  ±  1, . . . ,  x?(i)  ^  1,  x(<)  =  1 

P  [X“(i)  =  1,  X?+1(i)  ^  1, . . . ,  Xt°(i)  ±  1] 

=  p  [x«  x(<)  =  1|X“(<)  =  1, X“+I(i)  ^  i, . . . , Xf(0  ^  1 

and  the  desired  conclusion  follows  by  the  homogeneity  of  the  Markov  chain  t  = 

0,1,...}. 

The  case  t  =  t  is  straightforward.  ■ 


D.2.1  Some  preliminary  calculations 

Since  the  expressions  for  the  probabilities  of  interest  involve  the  stack  position  se¬ 
quences  {X^(i),t  =  0, 1, . . i  =  1, . . . ,  N,  associated  with  the  LRUSM  request 
stream  Ra,  we  shall  need  some  basic  facts  concerning  them  in  order  to  show  the  desired 
CIS  property.  Throughout  the  discussion  of  the  results  in  this  and  the  next  sections,  we 
fix  the  index  i  =  1, . . . ,  Ar  and  the  pmf  a,  and  lighten  up  the  notation  by  writing  Xt  to 
denote  the  stack  position  X°(i)  of  the  document  i  at  time  t.  For  each  t  =  0, 1, . . .,  let 
At  denote  the  event  [Xtj -  Dt, ,  X0  ^  D0]. 
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Recall  that  the  stack  distance  rvs  {Dt,t  =  0,1,...}  associated  with  {/?“,£  = 
0, 1, . . .}  are  i.i.d.  rvs  distributed  according  to  the  generic  rv  D  with  pmf  a.  We  set 

a(y)  =  P[D<y\  and  f3(y)  =  P  [D  >  y] ,  y  =  0, 1, . . . ,  N. 

and  define  the  quantities 

Qt(y; 0  :=  P [Xt  =  y, A- 1, =  £] ,  y,Z  =  i,...,N, 

for  each  t  —  1, 2, . . .. 

Proposition  D.2  For  each  t  =  1,2,...  and  £  =  1, . . . ,  IV,  it  holds  that 

Qt+i(y,0  =  a(y)Qt(y,€)  +  P(y-i)Qt(y-i]Q  (D.ii) 

for  all  y  =  1, . . . ,  N. 

Proof.  Fix  t  —  1,2,...  and  £  =  1, . . . ,  N.  The  case  y  —  1  requires  a  separate  analysis: 
The  evolution  (D.l)  precludes  Xt+1  =  1  under  the  condition  Xt  ^  D,  .  Therefore,  we 
must  have  P  [Xt+1  =  1,  At,  X0  =  £]  =  0  and  the  expression  (D.l  1)  holds  as  we  observe 
that  a(l)  =  0  and  P  [Xt  =  0,  At- i,  X0  =  £]  =  0. 

Next  we  turn  to  the  case  y  =  2, . . . ,  N .  The  evolution  (D.l)  implies  the  relation 
Xt+1  =  Xt  if  Dt  <  Xt  and  Xt+1  =  X,  + 1  if  Xt  <  Dt.  Thus,  the  event  [Xt+1  =  y,Xt^ 
Dt]  is  the  union  of  the  two  disjoint  events  \Xt  =  y  —  1,  Xt  <  Dt\  and  [Xt  =  y.  Dt  <  Xt], 
This  leads  naturally  to 

P  [Xt+1  =  y,  At,  X0  =  £]  =  P  [Xt+1  =y,Xt  +  A,  At-i,X0  =  £] 

=  P[Xt  =  y-l,Xt<Dt,At-i,X0  =  Z] 

+  T>[Xt  =  y,Dt<Xt,At-i,X0  =  £\ 
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=  P  [xt  =  y  -  i,  y  - 1  <  A,  A-i,  A  =  f] 

+  P  [A  —  if  A  <  y,  A-i,  A  —  £] 

=  P  [2/  -  1  <  A]P[A  =  2/-1,A-iAo  =  £] 

+  P[A<y]P[A  =  i/,A-i,Xo  =  e] 

as  we  make  use  of  the  fact  that  the  rv  A  is  independent  of  the  rvs  {XS,DS,  s  = 
0,  l,Xt}.  ■ 

The  case  t  =  0  in  (D.ll)  is  somewhat  different  but  by  essentially  the  same  argu¬ 
ments,  we  get  that 

Qi(y, 0  =  (%, 0«(0  +  S(y, Z  + 1 MO)  ■  p  Ao  =  e]  (D.12) 

for  arbitrary  y,  £  =  1, . . . ,  N.  This  follows  from  the  fact  that  constraints  exist  between 
the  stack  positions  X0  and  X,  on  the  event  Aq. 

D.2.2  Monotonicity  under  the  likelihood  ratio  ordering 

We  also  make  use  of  the  so-called  likelihood  ratio  ordering,  which  is  now  defined. 

Definition  D.3  For  IS -valued  rvs  X  and  Y ,  we  say  that  X  is  smaller  than  Y  according 
to  the  likelihood  ratio  (lr)  ordering,  written  X  </,,,  Y,  if 

P  [X  =  y]  P  [Y  =  x]  <  P  [X  =  x]  P  [Y  =  y]  (D.13) 

for  all  x  and  y  in  IN  with  x  <  y. 

The  likelihood  ratio  ordering  is  stronger  than  the  usual  stochastic  ordering  [59,  Thm. 
1.C.2,  p.  29],  i.e.,  if  the  IN-valued  rvs  X  and  Y  satisfy  X  <ir  Y,  then  X  <st  Y. 
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In  what  follows,  we  shall  find  it  convenient  to  use  the  following  notation:  If  X  is 
an  IN-valued  rv  and  A  is  an  event,  then  [X|,A]  denotes  any  rv  whose  distribution  is  the 
conditional  distribution  of  X  given  A.  The  comparison 

[X\A]  <w  [X\B\ 

for  some  other  event  B  then  amounts  to 

P  [X  =  y\A]  P[X  =  x\B]  <  P  [X  =  x\A\  P  [X  =  y\B\  (D.14) 

whenever  x  <  y  in  IN,  or  equivalently 

P  [X  =  y,A]  P  [X  =  x,B}<  P  [X  =  x,A]  P[X  =  y,B\  (D.15) 

provided  P  [Xl]  >  0  and  P  [B]  >0.  With  the  likelihood  ratio  ordering,  we  can  now  state 
the  following 

Theorem  D.4  For  £,  (  =  1, . . . ,  X  with  £  <  (,  it  holds  that 

[Xt|A— 1,X0  =  £]  <ir  [Xt|A-i,X0  =  a  1)2, ....  (D.16) 


Before  giving  a  proof  we  observe  that  the  comparison  (D.16)  holds  for  some  t  = 
1,2,...  if 

P  [Xt  =  y,  At-i,  X0  =  £]  P  [Xt  =  x ,  At- 1,  X0  =  C] 

<  P  [Xt  =  x,  At-uXo  =  e]  P  [Xt  =  y,  At- 1,  X0  =  C]  (D.17) 

for  x,  y  =  1, . . . ,  X  with  x  <  y. 

Proof.  The  proof  proceeds  by  induction  on  t  =  1,  2, . . ..  Throughout  we  fix  arbitrary 
f ,  C  =  1,  •  •  • ,  X  such  that  f  <  (. 
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The  basis  step:  For  t  =  1  the  comparison  (D.16)  (when  interpreted  through  (D.17)) 


requires  that 


Qi(y,C)QiMC)  <QiMC)Qi(y,C)  (D.18) 

for  all  x,  y  =  1 ,N  with  x  <  y. 

In  view  of  (D.12),  the  inequality  (D.18)  is  certainly  implied  by 

(<%,  o«(o + s( y ,  e + mo)  (s(x,  cmc) + 6(x,  c + mo) 

<  (8(x,  CMC)  +  S(x,  c  + 1 MO)  Mv,  CMC)  +  C  +  1)0(0) , 

an  inequality  we  can  rewrite  as 

%,  C)$(x,  CMCMC)  +  S(y,  C)8(x,  C  +  l)a(O0(O 
+  S(y,  c  +  i)S(x,  CMCMC)  +  8(y,  C  +  MM  C  +  1)0(00(0 

<  5(x,c)8(y,CMCMC)  +S(x,C)S(y,C  +  i)«(O0(O 

+  S(x, C  + 1 )S(y,  OPiCMC)  +  8(x, c  + 1 )S(y,  C  + 1)0(00(0-  (D-19) 

Comparing  like  terms  in  (D.19),  we  see  that  (D.18)  will  hold  since  the  four  inequalities 

8(y,£)S(x,()  <  S(x,C)S(y,C), 

d(y,C)$(x,C  +  1)  <  ^{x,C)8(y,C  +  1), 

%,  C  +  l)<5(x,  C)  <  s(x,  c  +  1  )S(y,  c) 

and 

Hv,  C  +  1)<5(X,  C  +  l)  <  <J(®,  C  +  1  )<%,  C  +  1) 

all  hold  under  the  constraints  x  <  y  and  C  <  C- 

The  induction  step:  Now  assuming  that  (D.16)  holds  for  some  t  —  1,2,...,  namely 

[Xt\At-i,Xo  =  C]  <ir  [Xt\At-i,X0  =  C],  (D.20) 
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we  seek  to  show  that 


[Xt+1\At,X0  =  £]  <zr  [Xt+1\At,X0  =  (}.  (D.21) 

As  discussed  earlier,  the  comparison  (D.20)  is  equivalent  to 

Qt(y'-  OQt(x'-,  0  <  OQtW;  0  (D.22) 

for  all  x',y'  —  1, ...  ,N  with  x'  <  y',  while  the  desired  comparison  (D.21)  is  equivalent 
to 

Qt+i(v> OQt+i(x-, c)  <  Qt+ i(x; QQt+i(y> 0  (d.23) 

for  all  x,  y  =  1, . . . ,  N  with  x  <  y. 

To  establish  (D.23),  we  fix  x,  y  =  1, , . . ,  N  with  x  <  y.  From  Proposition  D.2,  we 
have  the  expressions 

Qt+i(v,QQt+i(x-,  C)  =  a(y)a(x)Qt(y,£)Qt(x]Q  (D.24) 

+  a(y)/3(x  -  1  )Qt(y;  OQt(x  -  1;  C)  (D.25) 

+  /3(y  -  1  )a(x)Qt(y  -  1;  OQt{x]  C)  (D.26) 

+  (3(y  -  l)(3(x  -  1  )Qt(y  -  1;  £)Qt(x  -  1;  C)  (D.27) 

and 

Qt+i(x]£)Qt+i(v,Q  =  <*{x)oi{y)Qt{x\£)Qt{y\ C)  (D.28) 

+  a(x)(3(y  -  l)Qi(x;  -  1;  C)  (D.29) 

+  /3(x  -  l)a(j/)Qt(®  -  1;  QQt(w,  C)  (D.30) 

+  / 3(x  -  l)(3(y  -  1  )Qt(x  -  1;  £)<2t(s/  -  !;  0-  (D.31) 

Comparing  the  last  two  expressions  term  by  term,  namely  (D.24)  with  (D.28),  (D.25) 
with  (D.30),  (D.26)  with  (D.29),  and  (D.27)  with  (D.31),  we  conclude  from  (D.22)  that 
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(D.23)  holds.  This  completes  the  proof  of  the  induction  step. 


Before  we  can  state  the  main  results  of  this  section,  we  pause  for  an  easy  technical 
lemma. 

Lemma  D.5  Let  X  and  Y  be  {1, . . . ,  N}-valued  rvs  with  X  <st  Y,  and  let  D  be  an¬ 
other  {1, . . .  ,N}-valued  rv  independent  of  X  and  Y  with  pmf  a  =  (V/ , . . . . ,  aN),  i.e., 
P  [D  =  k\  =  dk,  k  =  1, . . . ,  N.  If  the  pmf  a  satisfies  the  condition  (9.22),  then  it  holds 
that 

P  [y  =  D]  <  P  [X  =  D] .  (D.32) 


Proof.  Set  be  =  ae  —  a^+1  for  £  —  1, . . . ,  N  —  1  and  bN  =  aN ,  so  that  ak 
each  k  =  1, . . . ,  N .  The  independence  of  the  rvs  X  and  D  leads  to 

N 

p  [X  =  D]  =  J2P[X  =  j]P  [D=j] 

3=1 


N 

=  EP  [X  =  j]aj 

3=1 

N  /  N  \ 

=  E  p[^  =  j] 

3=1  V =j  / 

N  l 

=  E^E  p  [^  =  i] 

1=  1  3=1 

N 

=  E^p  \x<t\ 

i= i 


Elk  h  for 


(D.33) 


and  we  similarly  find 


P  [Y  =  D] 


N 

E^'p  [p<E 


(D.34) 


e=i 

Under  the  assumption  X  <st  Y ,  we  have  from  (3.2)  that  P  [Y  <  (\  <  P  [A"  <  l]  for  all 
£  =  1, . . . ,  N.  It  is  plain  from  (D.33)  and  (D.34)  that  (D.32)  holds  once  it  is  noted  that 


165 


be  >  0  for  each  £  —  1, . . . ,  N,  under  the  monotonicity  condition  (9.22). 


Proposition  D.6  Assume  the  stack  distance  pmf  a  to  satisfy  the  condition  (9.22).  Then, 
for£,  C  =  1  ,...,N  with  £  <  it  holds  that 

Pt(C)<Pt(0,  t  =  0,1,....  (D.35) 

Proof.  First,  consider  the  case  t  =  0.  For  any  £  =  1, . . . ,  N,  we  find 

p0(o  =  p  [*!  =  i|*o  =  e]  =  «?• 

Hence,  for  any  £.  —  1 ... . . .  .V  with  £  <  £,  it  holds  that 

Po(C)  <  Po(0 


under  the  condition  (9.22). 

Fix  t  —  1,  2, . . ..  Recall  from  (D.l)  that 


[X,  ^  1, . . . ,  Xt^  1]  =  [X0  ^  D0,...,  /  A-i]  (D.36) 

and  that 

[Xt+1  =  1]  =  [Xt  =  Dt].  (D.37) 

Using  (D.36)  and  (D.37),  for  any  £  =  1, . . . ,  N,  we  can  rewrite  (D.5)  as 

Pt(0  =  P[Xt  =  Dt\X0  =  £,X0^D0,...,Xt_1^Dt_1] 

=  P[Xt  =  Dt\At-1,X0  =  £\.  (D.38) 
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Now,  fix  £,  (  =  1, . . . ,  N  with  £  <  Because  the  lr  ordering  implies  the  st  ordering, 
Theorem  D.4  readily  yields 

[Xt\At-i,  x0  =  £]  [Xt\At-i,  x0  =  (}.  (D.39) 

Under  the  monotonicity  condition  (9.22),  combining  (D.39)  with  Lemma  D.5  leads  to 
P  [Xt  =  Dt\At-i,X0  =  C]  <  P  [Xt  =  Dt\At~i,  X0  =  £] , 
and  the  desired  conclusion  (D.35)  is  obtained  upon  noting  (D.38).  ■ 

Proposition  D.7  Assume  the  stack  distance  pmf  a  to  satisfy  the  condition  (9.22).  Then, 
it  holds  that 

Pt+i(l)<Pt(l),  t  =  0,1,....  (D.40) 


Proof.  The  inequalities  (D.40)  are  simple  consequences  of  Proposition  D.6.  Fix  t  = 

1,2, _ Under  the  observation  that  [X0  =  1,  X0  ^  D0\  =  [X\  =  2],  we  find  via  (D.38) 

that 


frn(l) 


P  [Xt+\  =  Dt+i\At,X0  =  1] 

P[Xt+1  =  Dt+1\X0  =  l,X0^D0,. 

..,x \  +  Dt 

P[Xt+1  =  Dt+1\X1  =  2,X1^D1.. 

..A',#  D,} 

P[Xt  =  Dt\At-i,X0  =  2] 

Pt(  2) 

(D.41) 


where  the  forth  equality  follows  from  the  homogeneity  of  the  Markov  chain  { X, .  f  = 
0, 1, . . .}  and  by  the  independence  of  the  rvs  {Dt,t  =  0, 1, . . .}.  Invoking  Proposition 
D.6  with  (D.41),  we  get  the  inequality  (D.40). 
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The  case  t  =  0  uses  essentially  the  same  argument.  We  write 


Pi(l)  =  P[X1  =  D1\X0  =  1,X0^D0] 

=  P[X0  =  D0\X0  =  2] 

=  Po(2)  (D.42) 

and  the  inequality  Pi(l)  <  /  o(l)  simply  follows  from  Proposition  D.6  and  (D.42).  ■ 


D.2.3  Main  proof 

We  now  return  to  proving  Theorem  9.7  by  showing  that  the  sequences  {Vta(i),t  = 
0, 1, . . i  =  1, . . . ,  N,  are  CIS:  Fix  i  =  1, . . . ,  N.  Given  t  =  0, 1, . . .,  we  need  to 
show  that  (D.4)  holds  for  any  pair  of  vectors  xt  =  (x0, . . . ,  xt)  and  yt  =  (y0, . . . ,  yt)  in 
{0,  l}t+1  satisfying  x 1  <  yt  componentwise. 

The  case  t  =  0  is  rather  straightforward  as  (D.4)  then  reduces  to  establishing 

p  =  WW  =  o]  <  p  (v,"(i)  =  WW  =  l] 

or  equivalently, 

P  [X{ *(0  =  l|X0“(t)  +  1]  <  P  [X :f(t)  =  l|A'“('i)  =  1] .  (D.43) 

Conditioning  on  X“(i),  the  condition  (D.43)  becomes 

N 

E  fo«)p  [x?(i)  =  eix?(i)  + 1]  <  p0(  i) 

S=2 

which  indeed  holds  by  Proposition  D.6. 

From  now  on,  as  we  assume  £  =  1,2,...,  two  basic  cases  need  to  be  considered: 
Case  1:  Assume  xl  to  be  a  non-zero  element  in  {0, 1}£+1,  in  which  case  yf  is  also  a 
non-zero  element  in  {0,  l}t+1.  By  Proposition  D.l,  we  get  that  (D.4)  holds  provided 

Pt-r(Xt)(l)  <  Pt-r(yt)(  1),  (D.44) 
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an  inequality  which  is  automatically  satisfied  by  virtue  of  Proposition  D.7  given  that 

r(xt)  <  r(yt)  whenever  xt  <  y t- 

Case  2:  Assume  that  x1  is  the  zero  element  0*  =  (0, . . . ,  0)  in  {0,  l}m  and  note  that 

p  [%(<)  =  1| v^ii)  =  o*]  =  p  [a :“,(*)  =  1| xo“(0  ^  1, . . .  ,xta(i)  ±  1  . 

Invoking  again  Proposition  D.l  for  any  non-zero  element  y1  in  {0,  l}i+1 ,  we  see  that  the 
desired  inequality  (D.4)  reduces  to 

P[A7+1(!)  =  l|A'J(i)#l,..-,A'“(!)^l]  <  P,_r(yl)(l),  (D.45) 

and  by  Proposition  D.7,  it  then  clearly  suffices  to  establish  the  inequality 

P  [x “,(i)  =  l|A“(i)  +  1,...,A“ (t)  +  l]  <  P,(  1).  (D.46) 

Conditioning  on  X "“(f),  we  find 

p[A“,(i)  =  l|A0“(i)#l,...,Af(i)7i; 

N 

=  £  P,K)P  [A „“(*)  =  ? I  A0“(i)  7  1,  A'f  (i)  #  1, ....  A 7(0  #  1] 

N 

<  P(l)  E  p  [Af W  =  {|A0“W  *  1,  Af(i)  /  1, . . . ,  A?(i)  *  1] 

S-2 

=  P(l) 

where  the  inequality  follows  from  Proposition  D.6.  Thus,  the  required  condition  (D.46) 
holds.  This  completes  the  proof  of  the  CIS  property  of  the  sequence  {Vta(i),t  = 
0,1,...}. 

Finally,  since  the  sequence  {Vta(i),t  =  0, 1, . . .}  is  CIS  for  each  i  —  1, , . . ,  N  and 
CIS  implies  PSMD,  the  desired  comparison  between  Ra  and  its  independent  version 
Ra  follows  from  Proposition  9.2.  ■ 
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Appendix  E 


Proofs  of  Lemmas  10.1, 10.12, 11.1, 11.4  and  11.8 


E.l  A  proof  of  Lemma  10.1 

First,  consider  the  case  when  the  request  stream  R  =  {Rt,t  =  0, 1, . . .}  is  stationary.  In 
this  case,  we  have  for  each  t  —  1,2,...  and  for  all  t  >  r  —  1  that 


S  ( t ,  t;  R)  —  |  {i?(;_T+i)+ 

—  \{Rt-r+\i  ■  *  • )  Rt}\ 

st  |{i?0,  ■  ■  ■  ■  Rt~ l}| 

=  S(t  —  1,  r;  R). 


By  letting  t  go  to  infinity,  we  obtain  (10.2)  with  S(r;  R)  —st  S(r  —  1,  r;  R). 


Next,  we  show  that  the  limit  (10.1)  exists  for  each  r  =  1,  2, . . ..  From  the  definition 


of  the  working  set  size,  for  t  >  r  —  1,  we  can  write 


N 

S(t,  t;  R)  =  5Z(1  —  1  [Rt-i  7^  —  0,  - .  • ,  t  —  1]).  (E.l) 

1=1 


Consequently,  the  limit  (10.1)  can  be  rewritten  as 


S(t;R)  =  lim  ^J2S(t,T;R) 

T^ooi  t=o 
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lim 

T— xx) 


T-t  +  1 
T 


T— 1 


T~T  +  lt=?-\ 


E  S(t,r;R) 


1  r+T-2 

=  lim  —  E  S(t,T]R) 
T-> oo  T  ,  ,  v  ’ 

t—T —  1 


AT 


=  E  (  1  -  7llin  ^  E  1  Kx  ^  i,  l  =  0,  •  •  • ,  T  -  1]  )  .  (E.2) 

1=1  \  J'^°°  1  t=T- 1  / 

Because  the  limits  on  the  right-hand  side  of  (E.2)  are  guaranteed  to  exist  a.s.  by  the 
stationarity  assumption  of  the  request  stream  R  [62,  Chap.  5],  the  limit  (10.1)  exists  a.s. 
for  each  r  =  1,  2, . . .. 

In  addition,  if  the  request  stream  {Rt,  t  —  0, 1, . . .}  is  stationary  and  ergodic,  then 
[62,  Chap.  5]  for  each  i  —  1, . . . ,  N, 

t+T—2 

lim  -  E  ^\Rt-e^iJ=0,...,T-l}=P\RE^i,£  =  0,...,T-l}  a.s., 

oo  1 

t=T—  1 

and  it  follows  from  (E.l)  and  (E.2)  that 

N 


S(t;  R)  =  E(!-P[^M  =  0 

i=  1 

=  E  [S(t  —  1,  r;  R)] 

=  E  [S(t;R)},  t  —  1,2, ... . 


We  now  assume  that  the  request  stream  R  =  {Rt,t  =  0, 1, . . .}  couples  with  a 
stationary  sequence  of  A/”-valued  rvs  R  =  {Rt,t  =  0, 1, . . .}.  By  coupling,  we  mean 
that  there  exists  a  coupling  time  T*  such  that  Rt  =  Rt  for  all  t  >  T*,  with  the  (0, 1, . .  .}- 
valued  rv  T*  being  finite  a.s.  (see  e.g.,  [45,  64]).  Under  this  assumption,  it  holds  for  each 
r  =  1,  2, . . .  that 

S(t,r;  R)  =  S(t,r;  R),  t>T*  +  r  —  1,  (E.3) 

or  equivalently,  the  sequence  {S(t,  r;  R),  t  =  0,1,...}  couples  with  the  sequence 
{S(t,  r;R),t  =  0, 1, . . .}  where  the  coupling  time  is  given  by  T*  +  r  —  1.  By  the 
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first  part  of  the  proof,  S(t,  r;  R)  =>t  S(t ;  R)  for  each  r  =  1,  2, . . and  from  (E.3), 
we  get  S(t,  r;  R)  ==>t  S(r;  R)  with  S(r;  R)  =  S(r ;  R). 

By  a  similar  argument,  we  find 

lim  ^  53  1  ^  M  =  0, . . . ,  t  -  1] 

T^oo  ] 

t=T—  1 
i  T*+r— 2 

=  Jim  ™  53  1  ^  *,  t  =  0,  •  •  • ,  r  -  1] 

1  >  OO  , 

t=T—  1 

(t1  rT*\  1  r+T— 2 

— m ~ ^  y  1  7^  i,  £  =  0, . . . ,  r  —  1 

T  J  T  —  T*  t=1tXT_x  J 

l  t+T-2 

=  lim  —  y  1  ' Rt-i  7^  i,  £  —  0, . . . ,  r  —  1  . 

T— +oo l  t(r  ’  ’  ’  j 

t—T—  1 

By  virtue  of  (E.2),  the  limit  (10.1)  exists  for  each  r  =  1,2, . . .,  and  coincides  with 
S(t;  R).  Lastly,  if  the  sequence  R  is  stationary  and  ergodic,  the  argument  above  yields 

S(t-,  R)  =  S(t ;  R)  =  E  [S(r;  R)]  =  E  [S(t;  R)} 

for  each  r  =  1,  2, . . ..  ■ 


E.2  A  proof  of  Lemma  10.12 

Fix  r  =  1,2,....  We  first  consider  the  case  when  the  request  stream  R  =  {Rt,  t  = 

0, 1, . . .}  is  stationary  and  ergodic.  Fix  i  =  1, ,2V.  Recalling  from  (10.20)  and 

(10.21)  that 

g(Vt-r(i),  •  •  • ,  Vt(i))  =  1  [Rt  =  i,  Rt-t  ^  M  =  1, . . . ,  t]  ,  (E.4) 


we  can  write 


oo  T 


T+r-1 


t=T 
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(E.5) 


^  T+t— 1 

=  .}h+  T  E  1  [Rt  =  i,  Rt-e  ^  M  =  1, . . . ,  t] 

=  P  [Rt  =  i,  Ri  ^  i,  t  =  0, . . . ,  r  —  1]  a.s. 

where  the  last  equality  is  due  to  stationarity  and  ergodicity  of  the  request  stream  R  [62, 
Chap.  5].  Consequently,  the  limit  (10.23)  exists  and  is  given  by 

i  T+r-l  N  N 

lim  -  E  Y'9(Vt-T(i),...,Vt{i))  =  Y,V[Rr  =  i,Ri¥:i,t  = 

°°  t=T  i= 1  i=  1 

whence  the  conclusion  (10.25). 

Next,  we  assume  that  the  request  stream  R  =  {Rt,  t  —  0, 1, . . .}  couples  with  a  sta¬ 
tionary  and  ergodic  sequence  of  A/”- valued  rvs  R  =  {Rt,t  =  0, 1, . . .}.  Let  (0, 1, . .  .}- 
valued  rv  T *  be  the  coupling  time  where  T*  is  finite  a.s.  and  Rt  =  Rt  for  all  t  >  T*. 
Fix  i  —  lr.: . . ,  N  and  let  {Vt(i),t  =  0, 1, . . .}  be  the  indicator  sequence  associated  with 
R  through  (9.1).  Under  this  assumption,  it  is  plain  from  (E.4)  that 

g(Vt-r(i),  •  •  • ,  vt(i))  =  g{Vt-T(i), ...,  Vt{i)),  t  >  T*  +  T,  (E.6) 


hence, 


1 

1  T*+r-l 

lim  -  E  9(Vt-r(i),  ■  ■  ■ 

1  — XX)  J 


lim 

T— xx> 


t=T 

_  J-** 


T  J  T-T 


1  T+t—1 

£  ....  V,(i)) 


t=T*+r 


i  T+t—1 

lim  -  E  g(Vt-T(i), . . .  ,Vt(i)) 

T^°°  1  t=r 

Rt  —  i,  Ri  7^  i,  £  —  0, . . . ,  t  —  1 


a.s. 


(E.7) 


where  the  last  equality  follows  from  (E.5). 

As  a  result,  the  limit  (10.23)  exists  and  is  given  by 

i  T+t—1  N 

E  J29(Vt-T(i),...,Vt(i)) 

t=T  i=  1 
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=  Y,P[RT  =  i,Re^i,£  =  0,...,T-l\.  (E.8) 

i= 1 

Upon  noting  that 

N  N 

m^^E^(Ut_T(i),...,Ut(i))]  =  lim'%2'P[Rt  =  i,Rt-i^i,£  =  l,...,T] 

i= 1  i— 1 

N 

=  Rt  =  i,Re  ±  i,£  =  0, . . ,  ,r  -  1  , 

2=1 

the  desired  result  (10.24)  is  immediate  from  (E.8).  ■ 


E.3  A  proof  of  Lemma  11.1 

As  in  the  proof  of  Lemma  10.1,  we  first  assume  that  the  request  stream  R  =  {Rt,  t  = 
0, 1, . . .}  is  stationary.  From  the  definition  of  the  inter-reference  time,  we  have  for  each 
r  =  1,  2, . . .  and  t  —  r,  r  +  1, . . .,  that 

P  [T(t;R)>r]  =  P  [Rt^^  Ru  i  =  1, . . . ,  r] 

N 

=  5ZP  [.Rt  =  i,Rt-e  ^  i,i  =  1,  •  •  •  ,t]  (E.9) 

2—1 
N 

=  P  [Rr  =  i,  Rt^  h  t  =  0,  .  .  .  ,  T  -  1] 

2=1 

=  P[T(r;H)>r],  (E.10) 

where  the  third  equality  follows  from  the  stationarity  of  the  request  stream  R.  By  let¬ 
ting  t  go  to  infinity  in  (E.10),  we  obtain  T(t;  R)  ==$>t  T(R)  with  P  [T(R)  >  r]  = 
P  [T(t;  R)  >  t }  for  each  r  =  1,2,.... 

Next,  assume  that  the  request  stream  R  is  asymptotically  stationary,  i.e.,  {Rt+e,  t  = 
0, 1, . . .}  =^e  {Rt,  t  —  0, 1, . . .}  where  R  =  {Rt,  t  —  0, 1, . . .}  is  a  stationary  sequence 
of  A/”- valued  rvs.  Under  this  assumption,  we  note  for  each  i  =  1, . . . ,  N  that 

lim  P  [R^  =  i,  Rj-t  i,  t  —  1, . . . ,  t\  —  P  Rr  —  i,  R^  ^  i,  t  —  0, . . . ,  r  —  1 

t — »oo  L 
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and  invoking  (E.9),  thus  yields 


lim  P  [T(f;  R)  >  t]  —  P  T(R)  >  r  ,  r  —  1,2,.... 

t — xx)  L  J 

As  a  result,  the  weak  convergence  T(f;  R )  =U  T(i?)  holds  with  T(_R)  =st  T(R),  i.e., 
T(_R)  is  characterized  by  setting  P  [T(R)  >  t]  =  P  T(R)  >  r  for  each  r  —  1,2, _ 


E.4  A  proof  of  Lemma  11.4 

Under  the  assumptions  of  the  lemma,  we  note  from  Appendix  E.3  that 

P  [T(R)  >  t]  =  P  [T(R)  >  t 
n 

=  53  p  RT  =  i,Re  ^  i,£  =  0, . . .  ,t  -  1  . 

i=  1 

Consequently,  for  each  n  —  0, 1, . . we  find 

OO  N  OO 

£  P  [T(R)  =  =  (E.ll) 

r=n  i= 1  r=n 

First,  we  consider  the  expression  (E.ll)  for  n  =  0  in  which  case  E  [T(i2)]  = 
££L0  P  [T(R)  >  r].  For  each  k  —  0, 1, . . we  observe  that 

fc 

P  Rt  =  i,  Rt  ^  i,  £  =  o, . . . ,  r  -  1 

T— 0 

k 

=  1  —  P  Rq  i  +  53  P  Rt  —  h  Ri  7^  i,  £  —  0, .  .  .  ,  T  —  1 

T=  1 

fc 

=  1  —  P  ~j~  h  R\  7^  *  +  53  ^  -^r  =  Ri  ^  i,  £  =  o, .  .  ■  ,  T  —  1 


=  l-P[Re^i,£  =  0,...,k].  (E.12) 
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By  letting  k  go  to  infinity,  we  obtain 


OO 

VP  Rt  —  i,  Re  7^  i,  £  —  0, . . . ,  r  —  1  =  1  —  lim  P  Rt  ^  i,  i  —  0, . . . ,  k 

„  L  -I  k — »oo  L 

T  — (J 

=  1  (E.13) 

under  the  assumptions  (4.2)  and  (4.3)  that  the  popularity  pmf  p  of  R  (which  coincides 
with  that  of  R )  exists  and  is  admissible.  It  is  now  immediate  from  (E.ll)  and  (E.13) 
that 

N  OO 

E  [ T(R )]  =  ^p[i=t^^,<  =  0,..,T-l]=JV. 

i= 1  r= 0 

From  (E.12)  and  (E.13),  it  is  plain  that  the  expression  (E.ll)  for  the  case  n  = 
1,2,...,  can  be  rewritten  as 

oo  N  /  n—  1 

^P[T(JR)>r]  =  E  l"EP[«r  =  i.^M  =  0,.,r-l 

r=n  i= 1  \  r=0 

AT 

=  EP  =  , 

i= 1 

whence  the  desired  result. 


E.5  A  proof  of  Lemma  11.8 

To  establish  Lemma  11.8,  we  shall  make  use  of  the  following 

Lemma  E.l  For  a  request  stream  R  =  {Rt,  t  =  0, 1, . . .}  with  admissible  popularity 
pmf  p,  it  holds  for  each  i  =  1, . . . ,  N  and  for  each  k  —  1, . . . ,  N  that 

lim  P  [Rt  =  i,Re  ±  i,l  =  0, . . ,  ,t  -  1,  \{R0, . . . ,  Rt}\  =  k\  =  0.  (E.14) 

t — >oo 
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Proof.  For  each  i  —  1, . . . ,  N  and  k  —  1, . . . ,  N,  it  holds  that 


P  [Rt  —  i,Ri  7^  i,i  —  0, . . .  ,t  —  1,  { /An  •  •  • ,  Rt}\  =  k] 

<  P[Rt  =  i,Ri^i,£  =  Or...,t-l]f  t  —  1,2, ... ,  (E.15) 


and  that 

lim  P  [Rt  —  i,  Rt  ^  i,  £  —  0, . . . ,  t  —  1]  =  0  (E.16) 

t — XX) 

under  the  assumptions  (4.2)  and  (4.3)  that  the  popularity  pmf  p  of  R  exists  and  is  ad¬ 
missible.  Combining  (E.15)  and  (E.16)  simply  yields  (E.14).  ■ 


Proof  of  Lemma  11.8.  First,  we  assume  that  the  request  stream  R  =  {Rt,t  = 
0, 1, . . .}  is  stationary.  Fix  k  =  1 , . . . ,  N.  For  each  t  =  0, 1, . . .,  the  definition  of  the 
stack  distance  gives 


P  [D(t;  R)  =  k] 

=  P  \{Rt-T(t;R)+l,  ■  ■  ■  ,  Rt}\  —  k 

t~\~  1 

=  E  P  mt-,  R )  =  T,  \{Rt-T+i, ... ,Rt}\  =  k ] 

T— 1 

t  N 

=  =  Rt-r  =  h  Rt-t  7^  i,£  =  1,  •  ••  ,T  -  1,  \{Rt_T+i,  . .  .,Rt}\  =  k ] 

r= 1  i=  1 

N 

+  'Ep[Rt  =  ^Rt^iJ  =  Q,---,t-l,\{R0,...rRt}\  =  k]  (E.17) 

i= 1 

N  t 

=  p  [Rr  =  Ro  =  hRe  7^  =  1,  •  •  •  ,T  -  1,  |{-Rl,  •  •  -,Rt}\  =  k] 

i— 1  t=  1 

N 

+  ^2P[Rt  =  i,Re^i,£  =  0,...,t  —  l,  |{i?0, . .  .,Rt}\  =  k]  (E.18) 

i= 1 

where  the  last  equality  follows  from  the  stationarity  of  the  request  stream  R. 


Ill 


We  now  verify  the  existence  of  the  limit  of  (E.18)  as  t  goes  to  infinity.  For  each 
i  —  1, . . . ,  N  and  t  —  1,  2, . . we  have 

t 

MS)  '■=  X!P  iRr  =  Rq  =  hRt^  M  =  !>.  •  -,T  ~  !,  |{-Rl,  ■  •  •  ,-Rt}|  =  k\ 

T=  1 

t 

<  Y,PlRr  =  R  o  = 

T=  1 
OO 

<  sTj~P\Rt  =  R,Q  =  i1Re^i,t=l,...,T  -l\ 

T=  1 

=  P  [-Ro  =  *]  • 

Consequently,  for  each  i  =  1, . . . ,  N,  the  monotone  sequence  {MS),t  =  1,2,.. .}  is 
bounded  above  by  P  [R0  =  i],  thus  its  limit  exists,  is  finite  and  is  given  by 

M*)  :=  lim^fct(i) 

t — >oo 
OO 

=  P  [Rr  =  R0  =  i,  Re  ^  =  1,  •  •  •  ,T  -  1,  |{i?i, . . .  ,Rr}\  =  k] . 

T=  1 

Combining  this  fact  with  (E.18)  and  Lemma  E.l  yields 

N 

lim  P  [D(f,  R)  =  k}  =  J2  MV,  k  =  l,...,N, 

t — >oo  — 

i=  1 

whence  D(t;R )  ==>t  D(R)  with  D(R)  characterized  by  setting  P  [D(R)  —  k]  — 
Eili  Mi)  for  each  k  =  1,...,N. 

Now,  assume  that  the  request  stream  R  is  asymptotically  stationary,  i.e.,  {Rt+i,  t  = 
0, 1, . . .}  =>e  {Rt,t  =  0, 1, . . .}  where  R  =  {Rt,  t  —  0, 1, . . .}  is  a  stationary  sequence 
of  A/”-valued  rvs.  Fix  k  —  1, . . . ,  N .  Under  this  assumption,  we  note  that 

lim  P  [Rt  =  Rt—T  =  i,  Rt-i  ^  i,  t  =  1,  •  •  • ,  r  -  1,  \{Rt-T+1, . . . ,  Rt}\  =  k] 

t — >oo 

=  P  Rt  =  R0  =  i,  R(  ^  i,  £  =  1, . . . ,  r  —  1, \{Ri, . . . ,  RT}\  =  k  .  (E.19) 

for  each  i  =  1, . . . ,  N  and  r  =  1,2,... 
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We  shall  establish  the  existence  of  the  limit  of  P  [D(t;  R)  —  k]  as  t  goes  to  infinity 
by  using  the  expression  (E.17).  As  in  the  first  part  of  the  proof,  for  each  i  —  1, . . . ,  N, 
it  is  plain  that 


t 

Y  P  [Rt  =  Rt-r  =  h  Rt-e 

T=  1 

•  ,T-l,\{Rt-T+1,...,Rt}\  =  k] 

t 

Y~P[Rt  =  Rt-r  =  i,  Rt-e  ^  i,£  =  1,-  ■ 

T=  1 

.,T~1] 

P  [Rt  =  i\ ,  t  =  1,2,..., 

and  the  monotone  sequence  =  1,2,...}  is  bounded  above  by  1.  Conse¬ 

quently,  for  each  i  —  1, . . . ,  N,  lim^oo  ^fcit(i)  exists,  is  finite  and  is  given  by 


lim 

t — XX) 

t 

=  lim  ^  P  [Rt  =  Rt-r  =  i,  Rt-e  ^  i,  £  =  1,  •  •  • ,  r  -  1,  \{Rt-T+i, . . . ,  Rt}\  =  k\ 

t — xx  - - 

T=  1 

00 

=  Y  P  Rt  =  Ro  =  i,Re  7^  =  1,  •  •  •  ,T  -  1, \{Ri,  ■  ■  ■  ,-Rt}|  =  k  (E.20) 

T—  1 

as  we  make  use  of  (E.19). 

By  virtue  of  Lemma  E.l  and  (E.20),  it  now  follows  from  (E.17)  that 


lim  P  [D(t;  R)  =  k] 

t — XX 

N  oo 

=  RT  =  Ro  =  i,Re^i,£=lr*..,T-l,\{Ri,...,Rr}\=k 

i= 1  r— 1 

=  P  [D{R)  =  k],  k  =  l,...,N, 

and  D(t]R)  =>  D(R)  with  D(R)  =st  D(R),  i.e.,  P  [D{R)  =  k]  =  P  [d(R)  =  k 
for  each  k  —  1, . . . ,  N.  ■ 
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